I wasn't clear. Specifying the size of the files is not your real aim, I
guess. But you think that's what is needed in order to solve your problem
that we don't know about. 500MB is not a really big file in itself and is
not an issue for HDFS and MapReduce.
There is no absolute way to know how
Hi,
On running the following query I am getting multiple records with same
value of F1
SELECT F1, COUNT(*)
FROM
(
SELECT F1, F2, COUNT(*)
FROM TABLE1
GROUP BY F1, F2
) a
GROUP BY F1;
As per what I understand there are multiple number of records based on
number of reducers.
Replicating the test
Hi Shahab,
It will be great if someone can delete this email from PIG group. I am
aware of this mistake and had posted this issue to HIVE group almost
immediately.
Regards,
Gourav
On Mon, Jun 10, 2013 at 5:28 PM, Shahab Yunus shahab.yu...@gmail.comwrote:
Gourav, this is not a HIVE mailing
Ignore what I said and see
https://forums.aws.amazon.com/thread.jspa?threadID=51232
bzip2 was documented somewhere as being splittable but this appears to not
actually be implemented at least in AWS S3.
/a
On Mon, Jun 10, 2013 at 12:41 PM, Alan Crosswell a...@crosswell.us wrote:
Suggest that
Bzip2 is only splittable in newer versions of hadoop.
On Jun 10, 2013 10:28 PM, Alan Crosswell a...@crosswell.us wrote:
Ignore what I said and see
https://forums.aws.amazon.com/thread.jspa?threadID=51232
bzip2 was documented somewhere as being splittable but this appears to not
actually be
Let's say I have my input data from the past 12 months organized into subdirs
by date:
/data/2012-06-10
/data/2012-06-11
...
/data/2013-06-09
And now say that I want to run a Pig script to process data from a range of
dates within the last 12 months, say 2012-11-07 through 2013-05-26. The
Hi,
I am currently running pig from eclipse on hadoop cluster.
I added the hadoop conf location to the runtime configuration.
But the mapreduce jobs failed as the built class files of pig cannot be
called by hadoop.
I added class file location to the classpath, but it did not work.
Any hints?
There's two possibilites that come to mind.
1. Write a custom LoadFunc in which you can handle these regular
expressions. *Not the most ideal solution*
2. Use HCatalog. The example they have in their documentation seems to fit
your use case perfectly.
Hi,
Forget the question raised before. It's solved.
Hi,
I am currently running pig from eclipse on hadoop cluster.
I added the hadoop conf location to the runtime configuration.
But the mapreduce jobs failed as the built class files of pig cannot
be called by hadoop.
I added class file
Thank you for the suggestions.
Writing a custom LoadFunc seems like a valid solution for me, given that I
don't currently have Hive or HCatalog installed and I'm working on more of an
ad-hoc problem at this point.
HCatalog seems like a good solution for doing this type of thing on a repeated
10 matches
Mail list logo