Hi Ashok, The schema for your data comes from the data frame you're using in Spark and resolved with a Hive table schema if you are writing to one. For encodings, you don't need to configure them because they are selected for your data automatically. For example, Parquet will try dictionary-encoding first and fall back to non-dictionary if it looks like the dictionary-encoding would take more space.
I recommend writing out a data frame to Parquet and then just taking a look at the result using parquet-tools, which you can download from maven central. rb On Thu, Mar 3, 2016 at 10:50 PM, ashokkumar rajendran < ashokkumar.rajend...@gmail.com> wrote: > Hi Ted, > > Thanks for pointing out this. This page has mailing list for developers but > not for users yet it seems. Including developers mailing list only. > > Hi Parquet Team, > > Could you please clarify the question below? Please let me know if there is > a separate mailing list for users but not developers. > > Regards > Ashok > > On Fri, Mar 4, 2016 at 11:01 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > Have you taken a look at https://parquet.apache.org/community/ ? > > > > On Thu, Mar 3, 2016 at 7:32 PM, ashokkumar rajendran < > > ashokkumar.rajend...@gmail.com> wrote: > > > >> Hi, > >> > >> I am exploring to use Apache Parquet with Spark SQL in our project. I > >> notice that Apache Parquet uses different encoding for different > columns. > >> The dictionary encoding in Parquet will be one of the good ones for our > >> performance. I do not see much documentation in Spark or Parquet on how > to > >> configure this. For example, how would Parquet know dictionary of words > if > >> there is no schema provided by user? Where/how to specify my schema / > >> config for Parquet format? > >> > >> Could not find Apache Parquet mailing list in the official site. It > would > >> be great if anyone could share it as well. > >> > >> Regards > >> Ashok > >> > > > > > -- Ryan Blue Software Engineer Netflix