Did you try checking the task logs ?
There might be more details there ...
Regards,
Mridul
On Wednesday 09 March 2011 04:23 AM, Kris Coward wrote:
So I queued up a batch of jobs last night to run overnight (and into the
day a bit, owing to to a bottleneck on the scheduler the way that things
In which case, cant you not model that as a Bag ?
I imagine something like Tuple with fields person:chararray,
books_read:bag{ (name:chararray, isbn:chararray) }, etc ?
Ofcourse, it will work as a bag if the tuple contained within it has a
fixed schema :-) (unless you repeat this process N nu
As I elaborated before, given state of pig project, I would vote "-1" on
next release being 1.0
Ofcourse, it is as mentioned, non binding :-)
Regards,
Mridul
On Tuesday 08 March 2011 04:51 AM, Olga Natkovich wrote:
Hi guys,
We had a lively discussion last week regarding what version number
IMO 1.0 for a product typically promises :
1) Reasonable stability of interfaces.
Typically only major version changes break interface compatibility.
While we are at 0.x, it seems to be considered 'okish' to violate this :
but once you are at 1.0 and higher, breaking interface contracts will
Since XMLLoader does not seem to satisfy your requirements, and assuming
each line contains an xml document (which is required by XmlLoader
anyway iirc) what you can do is write a simple udf to handle this.
Use a line reader as loadfunc, and write a udf which parses the input
line as a Docum
On Thursday 16 December 2010 03:58 AM, John Hui wrote:
The outputSchema is set to Long
90 @Override
91 public Schema outputSchema(Schema input) {
92 return new Schema(new
Schema.FieldSchema(getSchemaName(this.getClass
().getName().toLowerCase(), input), DataType.CHARARRAY
That is a very nice tip, thanks !
Regards,
Mridul
On Friday 03 December 2010 02:49 PM, Anze wrote:
You could also try 'abc[|].*'. I find it is often easier (and less error-
prone) to use this principle than it is to escape the escaping character... :)
Just be careful with '-', it must be at
As of now, udf's are limited to only String's as constructor params.
Regards,
Mridul
On Thursday 02 December 2010 02:18 PM, Sheeba George wrote:
Hi Daniel
I have a related question. My UDF has a constructor that takes 2 param.
*
public* TopUDF(*int* top, *int* type){
m_cnt = top;
m_type
It would be a tradeoff between data-locality versus number of tasks
executed. In some of our experiments, it performed much worse (dont have
actual numbers, but it was in the 2x ballpark iirc) : ofcourse, ours was
a highly constrained and specialized experiment anyway !
On the other hand, th
[
https://issues.apache.org/jira/browse/PIG-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mridul Muralidharan updated PIG-1685:
-
Description:
We get the following exception, which seems to be related to processing
[
https://issues.apache.org/jira/browse/PIG-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923156#action_12923156
]
Mridul Muralidharan commented on PIG-1685:
--
Thanks guys, that was real q
Versions: 0.8.0
Reporter: Mridul Muralidharan
We get the following exception, which seems to be related to processing
counters per path :
java.net.URISyntaxException: Illegal character in path at index 71:
/projects/gridfaces/mridulm/doopdex/k_data_index/20100830_cdxcore_10.7_
[
https://issues.apache.org/jira/browse/PIG-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921908#action_12921908
]
Mridul Muralidharan commented on PIG-1684:
--
I am not sure if I understand
:
A custom StoreFuncInterface used to store data at the reducer.
(Output of a group )
Reporter: Mridul Muralidharan
Pig seems to be using multiple instances of StoreFuncInterface in the reducer
inconsistently.
Some hadoop api calls are made to one instance and others made to other
I did not follow your pig snippet ... it looks wrong (since only output
is 'group').
Could you do a "order by" and then a "limit" ?
I cant remember offhand if "order by" works within nested foreach (dont
have pig access right now to test, sorry).
If it is supported, something like might b
[
https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905859#action_12905859
]
Mridul Muralidharan commented on PIG-1309:
--
Condition (1) refers to only expl
Condition (1) refers to only explicit (user specified) statements right ?
Not implicit project introduced by pig to conform to schema ?
Regards,
Mridul
On Saturday 21 August 2010 12:59 AM, Ashutosh Chauhan (JIRA) wrote:
[
https://issues.apache.org/jira/browse/PIG-1309?page=com.atlass
Taking a guess, you could group things based on your criterion and
condition.
Something simple like :
a) group by usergroup (might be too expensive ? number of records across
timestamps for users in a group might be large !).
b) group by (usergroup, timestamp / window) [this will loose acc
sting for
us will be dismissed and not passed to the reducer part of the job, and
besides wouldn't the presence of null values affect the performance? For
example, if a2 would have many null values, then less values would be passed
too right?
Renato M.
2010/8/27 Mridul Muralidharan
On seco
On second thoughts, that part is obvious - duh
- Mridul
On Thursday 26 August 2010 01:56 PM, Mridul Muralidharan wrote:
But it does for COUNT(A.a2) ?
That is interesting, and somehow weird :)
Thanks !
Mridul
On Thursday 26 August 2010 09:05 AM, Dmitriy Ryaboy wrote:
I think if you do
But it does for COUNT(A.a2) ?
That is interesting, and somehow weird :)
Thanks !
Mridul
On Thursday 26 August 2010 09:05 AM, Dmitriy Ryaboy wrote:
I think if you do COUNT(A), Pig will not realize it can ignore a2 and
a3, and project all of them.
On Wed, Aug 25, 2010 at 4:31 PM, Mridul
One possibility might be some bug in use of combiner.
You could try disabling them and seeing if it works ...
Regards,
Mridul
On Wednesday 25 August 2010 01:42 AM, Wasti, Syed wrote:
Hi,
I have a very simple script and seeing a very strange behavior, getting
wrong results when running this scr
I am not sure why second option is better - in both cases, you are
shipping only the combined counts from map to reduce.
On other hand, first could be better since it means we need to project
only 'a1' - and none of the other fields.
Or did I miss something here ?
I am not very familiar to wh
[
https://issues.apache.org/jira/browse/PIG-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902392#action_12902392
]
Mridul Muralidharan commented on PIG-1321:
--
Is the merge prevented only if fla
[
https://issues.apache.org/jira/browse/PIG-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902350#action_12902350
]
Mridul Muralidharan commented on PIG-1518:
--
Might be a good idea to con
Are you using pig local mode ?
If yes, does this work with hadoop ?
Regards,
Mridul
On Friday 20 August 2010 12:05 AM, Matthew Smith wrote:
All,
I am running pig-0.7.0 and I have been running into an issue running the
ORDER command. I have attempted to run pig out of the box on 2 separate
L
[
https://issues.apache.org/jira/browse/PIG-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1292#action_1292
]
Mridul Muralidharan commented on PIG-1518:
--
if optimizer is turned off, does
[
https://issues.apache.org/jira/browse/PIG-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1280#action_1280
]
Mridul Muralidharan commented on PIG-365:
-
collecting only top k per mappe
lasses you create via
reflection.
Regards,
Mridul
Right now, my workaround is fairly robust but ugly - I am adding the
top-level jar to HADOOP-CLASSPATH. That jar lists a.jar, b.jar, ... in
the list of files in Class-Path in META-INF/MANIFEST.MF.
-sanjay
-Original Message-
From: Mridu
A short term alternative would be to find out the order in which pig
expands the jars, and ensure that your jars are expanded in reverse order.
As in, if you need your classpath to be "a.jar:b.jar:c.jar", and pig
un-jar's the register'ed jar in the order they are specified in the
script, the
You need the media framework, and would need to register those jars too
for pig to 'find' the relevant classes : looks like they might not be
part of plain jdk ?
Regards,
Mridul
On Wednesday 11 August 2010 03:35 AM, Ifeanyichukwu Osuji wrote:
The UDF i am making uses JAI from the javax.medi
If I understood your problem right, you can use define to pass
parameters to constructor and then use that (after populating it into a
instance field).
-- note, only String's are accepted as parameters !
define MY_UDF org.me.udfp.MyUDF('param1', 'param2');
--- This will call the constructor
[
https://issues.apache.org/jira/browse/PIG-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894388#action_12894388
]
Mridul Muralidharan commented on PIG-1530:
--
Cant edit comments .. to ad
[
https://issues.apache.org/jira/browse/PIG-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894368#action_12894368
]
Mridul Muralidharan commented on PIG-1530:
--
This looks more like a developer co
D '$inputPath' using my_custom_loader();
describe raw;
--
Regards,
-Rohini
-----Original Message-
From: Mridul Muralidharan [mailto:mrid...@yahoo-inc.com]
Sent: Friday, July 30, 2010 4:24 AM
To: pig-user@hadoop.apache.org
Cc: Uppuluri, Rohini
Subject: Re: Pig Data types
Are you returning the appropriate schema and using the correct schema in
the pig script ?
More info might help though !
Regards,
Mridul
On Thursday 29 July 2010 09:16 PM, Uppuluri, Rohini wrote:
Hi all,
I have a strange issue with data types. We have a custom loader which
loads data from lo
On Thursday 29 July 2010 01:18 AM, Swati Jain wrote:
Hello Everyone,
I am trying to execute below mentioned script, but it is throwing error.
Script is:
A = load 'ex_groupby' USING PigStorage(',') as (a1:int,a2:int,a3:int);
G1 = GROUP A by (a1,a2);
describe G1;
*D = Filter G1 by group.$0> 1;*
ay 28 July 2010 08:55 PM, Corbin Hoenes wrote:
Mridul -
What file format do you use to exchange data between pig and java? Text or
something else?
On Jul 25, 2010, at 1:52 PM, Mridul Muralidharan wrote:
In some of our pipelines, pig jobs are part of the pipeline - which consist of
other h
Hi,
We have a few projects which do this on hadoop, but I dont see any
reason why it cant have been done in pig.
As Alan and Ashutosh mentioned, the image itself will be just bytearray
(and so you need your own loader, or in our case use a sequence file
loader) : but you can extract and pop
In some of our pipelines, pig jobs are part of the pipeline - which
consist of other hadoop jobs, shell executions, etc.
We currently do this by using intermediate file dumps.
Regards,
Mridul
On Friday 23 July 2010 10:45 PM, Corbin Hoenes wrote:
What are some strategies to have pig and j
chararray,start:
long}
modified schema:
sessions: {first::sid: chararray,first::infoid:
chararray,first::imei: chararray,first::start: long}
Do you know a workaround ?
Le 13/07/10 10:13, Mridul Muralidharan a écrit :
The flatten will return the same schema as before (in 'first') :
so u
PM, Vincent Barat wrote:
Yes. I would have used DISTINCT too, but I cannot, since some of
the other fields can be different (the timestamp actually).
Thanks for your help.
Le 13/07/10 11:06, Mridul Muralidharan a écrit :
I am not sure why the prefix 'first' is coming in ... someon
t::infoid:
chararray,first::imei: chararray,first::start: long}
Do you know a workaround ?
Le 13/07/10 10:13, Mridul Muralidharan a écrit :
The flatten will return the same schema as before (in 'first') :
so unless you are modifying the fields or the order in which they
are generated
xactly same as start of the code snippet
for 'sessions'.
Regards,
Mridul
On Tuesday 13 July 2010 01:01 PM, Vincent Barat wrote:
Le 12/07/10 16:56, Mridul Muralidharan a écrit :
I am not sure what you mean here exactly.
Will a sid row have multiple (different) values for the oth
I am not sure what you mean here exactly.
Will a sid row have multiple (different) values for the other fields ?
If not, that is, you can simply have duplicates for rows : you can use
DISTINCT to achieve what you require :
sessions = DISTINCT sessions PARALLEL $PARALLELISM;
But if you wan
You will need to look at lifecycle of a udf to better understand this.
Typically they are created (note: one or more creations !) during plan
creation time (before job submission) and subsequently deserialized on
the various mapper/reducer nodes to get executed (iirc).
So typically what I ha
As an aside, if you are using Azkaban for purpose of cron, etc - you
might want to take a look at oozie : I think it has been released - and
iirc going to be opensourced too.
Regards,
Mridul
On Friday 25 June 2010 12:21 AM, Russell Jurney wrote:
Wrote a... thing about Pig at LinkedIn that
ally insert the
field definitions in my script before I run it. So in the example above
I would insert 'f1, f2, f3' everywhere I need to reference the tuple.
Another run might insert 'f1, f2' for an input that only has 2 extra fields.
On Thu, May 20, 2010 at 12:39 AM, Mridu
uld I access the items in the numbered fields
3..N where I don't know what N is? Are you suggesting I pass A to a
custom UDF to convert to a tuple of [time, count, rest_of_line]?
On Wed, May 19, 2010 at 4:11 PM, Mridul Muralidharan
mailto:mrid...@yahoo-inc.com>> wrote:
You can simply
You can simply skip specifying schema in the load - and access the
fields either through the udf or through $0, etc positional indexes.
Like :
A = load 'myfile' USING PigStorage();
B = GROUP A by round_hour($0) PARALLEL $PARALLELISM;
C = ...
Regards,
Mridul
On Thursday 20 May 2010 04:07
Hi,
Is there a way to parallelize copy of really large files ?
From my understanding, currently a each map in distcp copies one file.
So for really large files, this would be pretty slow if number of files
is really large.
Thanks,
Mridul
[
https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868155#action_12868155
]
Mridul Muralidharan commented on PIG-566:
-
Just to point out an error in the com
ing is actually spilled. This gets printed out even if there are
no spillable objects the Manager is aware of. An 8G map will certainly
trigger the GC.
On Fri, May 7, 2010 at 2:44 PM, Mridul Muralidharan
wrote:
Hi,
Do you know which snippet in the script is causing the issue ?
There are m
Hi,
Do you know which snippet in the script is causing the issue ?
There are multiple MR jobs which will be executed, what is causing the
exact issue ?
Map side spills is strange - are you sure it is not in the reducer ?
If it really is in the map side, I guess it is pointing to the case
I am not very sure what are all the runtime implications of some of pig
idioms (and I have a feeling it changes with impl) ... including nested
foreach.
For example :
B = foreach A {
X0 = ...
X = .. work on X0 ...;
GENERATE X, udf1(X), udf2(X);
}
will cause X0/X to be evaluated multi
CROSS is not a join, it is simply cartesian product.
Where did you see cross join ? maybe I am missing something ...
Regards,
Mridul
On Wednesday 28 April 2010 07:51 AM, hc busy wrote:
guys, I'm looking at the doc's for CROSS join and noticed that it's not
really a cross join, more rather jus
Hi Alex,
This is a bug in pig imo where it is pushing the filter before the
join : when it should not.
To validate, simply introduce an intermediate store/load pair to see
right results.
There probably already is some JIRA similar to this, if yes - please do
add to that or please do crea
r issues).
---
A = load 'input' AS (src:chararray, tgt:chararray, sc1:int);
B = GROUP A by src PARALLEL $PARALLELISM;
C = FILTER B by NOT IsEmpty($1) ;
dump C
---
Sorry for the confusion
Regards,
Mridul
On Thursday 22 April 2010 01:17 AM, Mridul Muralidharan wrote:
Hi,
Ju
this, there is a pig construct iirc "$2 IS NOT NULL" works, you dont
need the udf for that ...
-- and
T = filter U by my.udf.NOT(IsEmpty($3));
"IsEmpty($3) != false" ?
or "IsEmpty($3) != true" can replace NOT udf ?
Regards,
Mridul
it was for an older ver of pig
In case of co-group, if nothing matched the group key, you get an empty
bag, not null.
So checking for COUNT(alias) == 0 is what you need.
Regards,
Mridul
On Wednesday 21 April 2010 03:37 PM, Alexander Schätzle wrote:
Hello,
I want to use IS NULL in a FILTER but the behavior seems to be
You might want to be careful with this ... the udf could get used in
both map & reduce side, no ?
Regards,
Mridul
On Wednesday 31 March 2010 02:22 AM, Sandesh Devaraju wrote:
Hi All,
Is there a way to get current InputSplit in a UDF (more specifically,
a filter function)?
I have a filter f
On Tuesday 09 March 2010 04:13 AM, hc busy wrote:
okay. Here's the bag that I have:
{group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int,
number2:int}}
and I want to do this
grunt> CALCULATE= FOREACH TABLE_group GENERATE group, SUM(TABLE.number1 /
TABLE.number2);
TAB
On Saturday 06 March 2010 04:35 AM, hc busy wrote:
Guys, I have some data that has null bag. Looking at the COUNT.java it seems
that it is an error condition for the bag passed in to be null (instead of
zero for example.)
I tried to change it to an empty bag when it's null
data = FOREACH input
On Saturday 06 March 2010 04:47 AM, Thejas Nair wrote:
I am not sure why the rate at which output is generated is slowing down.
But cross in pig is not optimized it uses only one reducer. (a major
limitation if you are trying to process lots of data with a large cluster!)
CROSS is not suppos
Within same map or reduce step - as subsequent operators.
I dont think it combines operators, but it is not different jobs - if
that is what you are worried about. Think of it like a pipeline ...
Regards,
Mridul
On Monday 01 March 2010 05:32 PM, prasenjit mukherjee wrote:
Thanks that will w
e separate
delimiters for fields,bags ? Basically the content of my file should
now be :
a b c {(15,good),(24,total),(9,bad)}
a b d {(2,bad),(6,good),(8,total)}
-Prasen
On Mon, Mar 1, 2010 at 2:23 AM, Mridul Muralidharan
wrote:
Your schema is essentially :
(stri
Slightly digressing and possibly rambling - feel free to ignore !
Making it a general problem when both lists are 'large' (too large to
fit into memory).
A general solution for this, when the list of blacklist emails, is an
interesting problem. Probably something which might benefit from the
Just curious, what was the actual error with using filter's within
nested foreach ?
Will it be possible to show the snippet ? (and schema of input ?).
We are using this without issue right now, so curious what the problem
here is ..
Thanks,
Mridul
On Friday 26 February 2010 03:17 AM, zaki
You can get in touch with Arnab if you want more info on it ... I am
sure he will be very much interested to see others using it :-)
Regards,
Mridul
On Friday 26 February 2010 08:43 AM, prasenjit mukherjee wrote:
Any thoughts on including python-based UDFs like the following :
http://arnab
Note, as should be obvious, the new file will have the delimiter '\t'
and not ','.
To give us :
r1 = load '/tmp/prasen/foo1.txt_new' using PigStorage('\t') AS
(f1:chararray, f2:chararray,f3:chararray, B:{T1:(i1:int,s1:chararray)});
Regards,
Mridul
Your schema is essentially :
(string, string, string, bag).
With bag containing tuples with schema (number, string).
Based on this, the schema should be what you described second - namely :
r1 = load '/tmp/prasen/foo1.txt' using PigStorage(',') AS (f1:chararray,
f2:chararray,f3:chararray, B
Is this documented behavior or current impl detail ?
A lot of scripts broke when multi-query optimization was committed to
trunk because of the implicit ordering assumption (based on STORE) in
earlier pig - which was, iirc, documented.
Regards,
Mridul
On Thursday 11 February 2010 10:52 PM,
e group by, even if it's only null values.
I just wandered if theres anything to be done about the NPE to make it
more clear, that's all.
I guess you can see this as an eventual feature / improvement of some
sort, no problems :)
alex
On Tue, Feb 9, 2010 at 11:35 AM, Mridul Mura
On second thought, probably A itself is NULL - in which case you will
need a null check on A, and not on A.v (which, I think, is handled iirc).
Regards,
Mridul
On Tuesday 09 February 2010 04:02 PM, Mridul Muralidharan wrote:
Without knowing rest of the script, you could do something like
Without knowing rest of the script, you could do something like :
C = FOREACH B {
X = FILTER A BY v IS NOT NULL;
GENERATE group, (int)AVG(X) as statsavg;
};
I am assuming it is cos there are nulls in your bag field.
Regards,
Mridul
On Tuesday 09 February 2010 03:52 PM, Alex Parvulescu wr
't want to write any random N rows to the table. I want to write
the
*top* N rows - meaning - I want to write the "key" values of the Reducer
in
descending order. Does this make sense? Sorry for the confusion.
On Wed, Jan 27, 2010 at 11:09 PM, Mridul Muralidharan <
mrid..
There is an error in the basic script - which I propagated in my copy
paste - corrected below.
Regards,
Mridul
Mridul Muralidharan wrote:
There are two ways to handle this.
You can pass it along as a parameter as you did in the script - though
note that, in your udf, it will be a tuple
There are two ways to handle this.
You can pass it along as a parameter as you did in the script - though
note that, in your udf, it will be a tuple with first field == category,
second field == "110".
public Boolean exec(Tuple _input) throws IOException {
String input = (String)_input.
A possible solution is to emit only N rows from each mapper and then use
1 reduce task [*] - if value of N is not very high.
So you end up with utmost m * N rows on reducer instead of full inputset
- and so the limit can be done easier.
If you ok with some sort of variance in the number of r
Jeff Zhang wrote:
*See my comments below*
On Mon, Jan 25, 2010 at 3:22 PM, Something Something <
mailinglist...@gmail.com> wrote:
If I set # of reduce tasks to 1 using setNumReduceTasks(1), would the class
be instantiated only on one machine.. always? I mean if I have a cluster
of
say 1 maste
A possible solution is to emit only N rows from each mapper and then use
1 reduce task [*] - if value of N is not very high.
So you end up with utmost m * N rows on reducer instead of full inputset
- and so the limit can be done easier.
If you ok with some sort of variance in the number of r
On Tue, Jan 26, 2010 at 3:08 PM, Mridul Muralidharan
wrote:
Jeff Zhang wrote:
*See my comments below*
On Mon, Jan 25, 2010 at 3:22 PM, Something Something <
mailinglist...@gmail.com> wrote:
If I set # of reduce tasks to 1 using setNumReduceTasks(1), would the
class
be instantiat
Jeff Zhang wrote:
*See my comments below*
On Mon, Jan 25, 2010 at 3:22 PM, Something Something <
mailinglist...@gmail.com> wrote:
If I set # of reduce tasks to 1 using setNumReduceTasks(1), would the class
be instantiated only on one machine.. always? I mean if I have a cluster
of
say 1 maste
If each line from your file has to be processed by a different mapper -
other than by writing a custom slicer, a very dirty hack would be to :
a) create N number of files with one line each.
b) Or, do something like :
input_lines = load 'my_s3_list_file' as (location_line:chararray);
grp_op = G
The only other suggestion I can make, other than what has already been
mentioned by others, is to parameterize the PARALLEL value - so that you
use optimal number of reducers for the test (depending on the cluster
size and the number of reducers per node).
Regards,
Mridul
Rob Stewart wrote
y much for digging in here, a second set of eyes is handy.
-clint
On Tue, Jan 19, 2010 at 1:37 AM, Mridul Muralidharan
wrote:
Clint Morgan wrote:
After the 2PC process has determined that a commit should happen there is
no
roll-back. The commit must be processe
it failure in a indexed regionserver
does a rollback of the txn, then the issue I mentioned can occur ?
Thanks for your patience and time !
Regards,
Mridul
-clint
On Fri, Jan 15, 2010 at 2:43 AM, Mridul Muralidharan
wrote:
I think I might not have explained it well enough.
As part of execu
risen =) I should probably start a Google group or
something.
T#
On Fri, Jan 15, 2010 at 11:56 AM, Mridul Muralidharan
wrote:
This looks really promising Theo !
Is there some mailing list where discussions & queries related to piglet are
discussed ?
Thanks,
Mridul
Theo Hultberg wrote:
Hi,
This looks really promising Theo !
Is there some mailing list where discussions & queries related to piglet
are discussed ?
Thanks,
Mridul
Theo Hultberg wrote:
Hi,
I've written a Ruby DSL for writing Pig scripts, which I hope might
interest some of you. It makes it possible to do a lot of
int
On Sun, Jan 3, 2010 at 4:46 PM, Mridul Muralidharan
wrote:
stack wrote:
On Sun, Jan 3, 2010 at 10:46 AM, Mridul Muralidharan
wrote:
I was wondering about the atomicity guarantees when using secondary
indexes from within a transaction.
You are talking about indexed hbase from transact
As a suffix to what Dmitriy described - just add a project to pick the
columns you need.
c = join a by filename, b by filename PARALLEL $MY_PARALLELISM;
--- Please check this syntax though with pig latin docs.
d = foreach c generate a::filename; --- Or anything else you want to pick.
if you ne
Chris Hartjes wrote:
My apologies if this is the wrong mailing list to ask this question. I've
started playing around with Pig and Hadoop, with the intention of using it
to do some analysis of a collection of MySQL slow query log files. I am not
a Java programmer (been using PHP for a very long
To add to what Mathew clarified - you will need to send that empty request when
server has responded to all your requests.
This happens typically when :
a) the request was held at CM for the max configured time.
b) CM/server had something to send to client.
Regards,
Mridul
--- On Mon, 11/1/10,
Hi,
Is there a way to specify Chainned mappers with Multiple Inputs ?
Essentially building a pipeline of mapper's based on the different
input's involved ?
For something like :
current_data (seq_files) -> process -> emit key, value.
new_data (text_files) -> sanitize -> preprocess -> proce
12:26 AM, Mridul Muralidharan
wrote:
Hi,
This is assuming there is no easier way to do it (someone from hbase team
can comment better !).
But the usual way to handle this for mapreduce is to create a composite
input format : which delegates to the underlying formats to generate the
splits, and
like same thing happen there too.
>
> Since this seemed like a relevant ongoing thread, i though
> i would clear my point here.
> Is this how it should be?
>
> Abhinav Singh,
> Bangalore,
> India
> http://abhinavsingh.com/blog
>
> From:
> Mridul Muralidharan
&g
--- On Sat, 9/1/10, Peter Saint-Andre wrote:
> From: Peter Saint-Andre
> Subject: Re: [BOSH] Pipelining / avoiding use of 2x HTTP-sockets
> To: "Bidirectional Streams Over Synchronous HTTP"
> Date: Saturday, 9 January, 2010, 1:50 AM
> On 12/30/09 8:47 AM, Mr
A collegue is unable to send this mail to the list, so proxying it.
Thanks in advance for the responses !
Regards,
Mridul
---
Hi,
I'm trying to better understand the flow of the client read operation in
HBase. I've been looking at a combination of the HBase documents, Lars
George's summar
clarifying !
Regards,
Mridul
Jean-Daniel Cryans wrote:
Use the commands described here:
http://wiki.apache.org/hadoop/Hbase/RollingRestart
J-D
On Fri, Jan 8, 2010 at 11:49 AM, Mridul Muralidharan
wrote:
Hi,
Suppose I want to add a new region server to my instance. I imagine I need
to add it to the
Hi,
Suppose I want to add a new region server to my instance. I imagine I
need to add it to the list in the conf files for Hbase and Hadoop, and
then stop/start the cluster. Is there any way to add the server without
stopping the system?
Thanks,
Mridul
1201 - 1300 of 1557 matches
Mail list logo