Hi,
What about the DAG can you send that as well? From the resulting
"write" call?
On Wed, Nov 1, 2017 at 5:44 AM, אורן שמון wrote:
> The version is 2.2.0 .
> The code for the write is :
> sortedApiRequestLogsDataSet.write
> .bucketBy(numberOfBuckets, "userId")
> .mode(SaveMode.Overwrite)
> .format("parquet")
> .option("path", outputPath + "/")
> .option("compression", "snappy")
> .saveAsTable("sorted_api_logs")
>
> And code for the read :
> val df = sparkSession.read.parquet(path).toDF()
>
> The read code run on other cluster than the write .
>
>
>
>
> On Tue, Oct 31, 2017 at 7:02 PM Michael Artz
> wrote:
>
>> What version of spark? Do you have code sample? Screen shot of the DAG
>> or the printout from .explain?
>>
>> On Tue, Oct 31, 2017 at 11:01 AM, אורן שמון
>> wrote:
>>
>>> Hi all,
>>> I have Parquet files as result from some job , the job saved them in
>>> bucket mode by userId . How can I read the files in bucket mode in another
>>> job ? I tried to read it but it didnt bucket the data (same user in same
>>> partition)
>>>
>>
>>