Re: Read parquet files as buckets

2017-11-01 Thread Michael Artz
Hi,
   What about the DAG can you send that as well?  From the resulting
"write" call?

On Wed, Nov 1, 2017 at 5:44 AM, אורן שמון  wrote:

> The version is 2.2.0 .
> The code for the write is :
> sortedApiRequestLogsDataSet.write
>   .bucketBy(numberOfBuckets, "userId")
>   .mode(SaveMode.Overwrite)
>   .format("parquet")
>   .option("path", outputPath + "/")
>   .option("compression", "snappy")
>   .saveAsTable("sorted_api_logs")
>
> And code for the read :
> val df = sparkSession.read.parquet(path).toDF()
>
> The read code run on other cluster than the write .
>
>
>
>
> On Tue, Oct 31, 2017 at 7:02 PM Michael Artz 
> wrote:
>
>> What version of spark?  Do you have code sample?  Screen shot of the DAG
>> or the printout from .explain?
>>
>> On Tue, Oct 31, 2017 at 11:01 AM, אורן שמון 
>> wrote:
>>
>>> Hi all,
>>> I have Parquet files as result from some job , the job saved them in
>>> bucket mode by userId . How can I read the files in bucket mode in another
>>> job ? I tried to read it but it didnt bucket the data (same user in same
>>> partition)
>>>
>>
>>


Read parquet files as buckets

2017-10-31 Thread אורן שמון
Hi all,
I have Parquet files as result from some job , the job saved them in bucket
mode by userId . How can I read the files in bucket mode in another job ? I
tried to read it but it didnt bucket the data (same user in same partition)