``
> >>> import pandas
> >>> pandas.__version__
> '0.14.0'
> ```
>
> On Thu, Oct 8, 2015 at 10:28 PM, ping yan wrote:
> > I really cannot figure out what this is about..
> > (tried to import pandas, in case that is a dependency, but it didn
I really cannot figure out what this is about..
(tried to import pandas, in case that is a dependency, but it didn't help.)
>>> from pyspark.sql import SQLContext
>>> sqlContext=SQLContext(sc)
>>> sqlContext.createDataFrame(l).collect()
Traceback (most recent call last):
File "", line 1, in
F
Hi,
I have a use case where I'd like to mine frequent sequential patterns
(consider the clickpath scenario). Transaction A -> B doesn't equal
Transaction B->A..
>From what I understand about FP-growth in general and the MLlib
implementation of it, the orders are not preserved. Anyone can provide
r nodes will not be able to
>execute the *filter* on *innerRDD *as the code in the worker does not
>have access to "sc" and can not launch a spark job.
>
>
> Hope it helps. You need to consider List[RDD] or some other collection.
>
> -Kiran
>
> On Tue, Jun 9,
hoices left seem to be: 1) groupByKey() and then work with the
ResultIterable object; 2) groupbyKey() and then write each group into a
file, and read them back as individual rdds to process..
Anyone got a better idea or had a similar problem before?
Thanks!
Ping
--
Ping Yan
Ph.D. in Manag
the ip
> frequency table. Hope that helps :)
>
>
> On Thursday, May 21, 2015, ping yan wrote:
>
>> I have a dataframe as a reference table for IP frequencies.
>> e.g.,
>>
>> ip freq
>> 10.226.93.67 1
>> 10
22.18', '31.207.6.173', '208.51.22.18'])
freqs = rdd.map(lambda x: df.where(df.ip ==x ).first())
It doesn't get through.. would appreciate any help.
Thanks!
Ping
--
Ping Yan
Ph.D. in Management
Dept. of Management Information Systems
University of Arizona
Tucson, AZ 85721