Re: Does Rollups work with spark structured streaming with state.

2021-06-17 Thread Mich Talebzadeh
Great Amit, best of luck Cheers, Mich view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on

Re: Does Rollups work with spark structured streaming with state.

2021-06-17 Thread Amit Joshi
HI Mich, Thanks for your email. I have tried for the batch mode, Still looking to try in streaming mode. Will update you as per. Regards Amit Joshi On Thu, Jun 17, 2021 at 1:07 PM Mich Talebzadeh wrote: > OK let us start with the basic cube > > create a DF first > > scala> val df = Seq( >

Re: class KafkaCluster related errors

2021-06-17 Thread Mich Talebzadeh
This is interesting because I am using PySpark but I need these jar files for Spark 3.1.1 and Kafka 2.7.0 to work kafka-clients-2.7.0.jar commons-pool2-2.9.0.jar spark-streaming_2.12-3.1.1.jar spark-sql-kafka-0-10_2.12-3.1.0.jar Do you have equivalent of these artifacts in your POM file? HTH

Re: class KafkaCluster related errors

2021-06-17 Thread Mich Talebzadeh
Hi Kiran, You need kafka-clients for the version of kafka you are using. So if it is the correct version keep it. Try running and see what the error says. HTH view my Linkedin profile *Disclaimer:* Use it at your own risk. Any

Re: Migrating from hive to spark

2021-06-17 Thread Mich Talebzadeh
Ok the first link throws some clues .*... Hive excels in batch disc processing with a map reduce execution engine. Actually, Hive can also use Spark as its execution engine which also has a Hive context allowing us to query Hive tables. Despite all the great things Hive can solve, this post is to

Re: Does Rollups work with spark structured streaming with state.

2021-06-17 Thread Mich Talebzadeh
OK let us start with the basic cube create a DF first scala> val df = Seq( | ("bar", 2L), | ("bar", 2L), | ("foo", 1L), | ("foo", 2L) | ).toDF("word", "num") df: org.apache.spark.sql.DataFrame = [word: string, num: bigint] Now try cube on it scala>

Migrating from hive to spark

2021-06-17 Thread Battula, Brahma Reddy
Hi Talebzadeh, Looks I confused, Sorry.. Now I changed to subject to make it clear. Facebook has tried migration from hive to spark. Check the following links for same. https://www.dcsl.com/migrating-from-hive-to-spark/

RE: Small file problem

2021-06-17 Thread Boris Litvak
Compact them and remove the small files. One messy way of doing this, (including some cleanup) looks like the following, based on rdd.mapPartitions() on the file urls rdd: import gzip import json import logging import math import re from typing import List import boto3 from mypy_boto3_s3.client

Re: Does Rollups work with spark structured streaming with state.

2021-06-17 Thread Amit Joshi
Hi Mich, Yes, you may think of cube rollups. Let me try to give an example: If we have a stream of data like (country,area,count, time), we would be able to get the updated count with different combinations of keys. > As example - > (country - count) > (country , area - count) We may need to