from:"Kim C."

[R] Save rules with asRules() from Rattle into dataframe

2015-02-23 Thread Kim C.

Hello, 


I use asRules from the Rattle package to make rules of a rpart decision tree: 
asRules(rpart1). It gives a rule such as (don't mind the data, it's merely 
testdata): 


 Rule number: 18 [Product=153 cover=3 (1%) prob=0.00]
   TotalChildren=4.5
   Education=Bachelors,Partial College,Partial High School
   Gender=F
   Occupation=Skilled Manual


Then I want to save it so I use: rules -asRules(rpart1) but it saves it as an 
integer. When I view rules I only see one column of integers. Not the data 
like the example above. Preferably I would like to have that data in a 
dataframe. 

The reproducible example can be found here: 

www.mediafire.com/download/8qqzq3qqu2mlmb1/decision+tree+rules+example.RData

The rpart tree is made with this: rpart1 - rpart(Product ~ ., data=subset5, 
method=class, control=rpart.control(minbucket=2,minsplit=1, cp=-1))

Hope someone can help me out. Thanks! 
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to read this Rpart decision tree?

2015-02-11 Thread Kim C.

Hi all, 

In the attachment or this link (http://oi58.tinypic.com/35ic9qc.jpg) you'll 
find the decision tree I made. I used the Rpart package to make the tree and 
the rattle package using the fancyRpartPlot to plot it. The data in the tree 
looks different than about every example I have seen before. I don't understand 
how I should read it. I want to predict Product (which are productkeys). The 
variables to predict it contain age, incomegroup, gender, totalchildren, 
education, occupation, houseownerflag, numberCars.It looks like the upper 
number is a ProductKey. The n is number of observations? And the percentage 
of the yes/no question below. 

This is the code I used.  

 ss.rpart1 - rpart(Product ~ ., data=sstrain, 
 control=rpart.control(minbucket=2,minsplit=1, cp=-1))
 spt - which.min(ss.rpart1$cptable[, xerror])
 scp - ss.rpart1$cptable[opt, CP]
 ss.rpart2 - prune(ss.rpart1, cp=cp)
 fancyRpartPlot(ss.rpart2)

So why does the tree looks so different from the most (for example: 
http://media.tumblr.com/a9f482ff88b0b9cfaffca7ffd46c6a8e/tumblr_inline_mz7pyuaYJQ1s5wtly.png).
 This is from Trevor Stephen's TItanic tutorial. The first node show that 62% 
of 100% doesn't survive. If they were male, only 19% of them were survivors. I 
find that a lot examples look like that. Why does mine predict per ProductKey 
and every node it has something else. it doesn't make sense to me. And it 
doesn't have the two numbers like .62 and .38 but it has n=197e+3. So should I 
read the first node like For 100% of the observations of ProductKey 1074, the 
incomegroup was moderate)?

Thank you!

Kim


  __
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] xerror and xstd are missing from cptable of the Rpart package

2015-02-03 Thread Kim C.

Hello all, 
I'm making a decision tree with the rpart package. I want to prune the tree and 
in many tutorials it says to use cptable. Like so: opt - 
which.min(model_rpart$cptable[, xerror])
The problem is that when I look up model_rpart$cptable it only show the columns 
CP, nsplit, rel error. So xerror and xstd are missing. How can this be? 
Model looks like this: model - rpart(Product~. , data=trainData, 
control=rpart.control(minsplit=50, cp=0.002, xval=0))

Thank you. 
Kim   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Decision trees take long or throws memory limit message

2015-01-30 Thread Kim C.

Hi everyone,  
I try making decision trees and random forest using the packages rpart and 
party. I'm already stuck at t he first step. Each time when I enter the code 
either 1. R takes more than an hour. I haven't waited long enough to see if 
there's a result but it doesn't look like it! When I hit the stop button it 
also freezes and I need to force quit R. This happens with rpart(). 
2. Or I get the message :
Error: cannot allocate vector of size 5.0 GbIn addition: Warning messages:1: In 
cbind(RET, tr[[i]]) :  Reached total allocation of 16287Mb: see 
help(memory.size)2: In cbind(RET, tr[[i]]) :  Reached total allocation of 
16287Mb: see help(memory.size)3: In cbind(RET, tr[[i]]) :  Reached total 
allocation of 16287Mb: see help(memory.size)4: In cbind(RET, tr[[i]]) :  
Reached total allocation of 16287Mb: see help(memory.size)
When I look at Windows task manager it goes from in use: 4GB to use in use: 
14.5 GB causing it to have no memory left (15.9 GB  is the limit on my 
computer).  The trainset is big (almost 9 million records and 13 variables). I 
already increased the memory.limit() but it didn't work. This happens with 
ctree(), cforest(). 
I don't have much technical knowledge and I am a beginner at R. I use the 
64-bit version on Windows. 
Examples of the code that I used:dt - rpart(Product ~ Age + TotalChildren + 
NumberCarsOwned, data=TrainData, method=class, 
control=rpart.control(minsplit=50, cp=0, xval=0))
Formula - Product ~ Age + TotalChildren + NumberCarsOwnedctree - 
ctree(Formula, data=TrainData)
rf - cforest(rFormula, data=TrainData)
On a smaller data set (of 18.000 records) it does seem to work...How can I make 
it work on my dataset? Is there something in the arguments that I should 
change. 
Kind regards, 
Kim   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Negative length vectors are not allowed error

2015-01-20 Thread Kim C.

Hi all, I have a question concerning an error that occurs when using the 
inspect() function on a set of rules made by the apriori function from the 
arules package. I have a dataset with 12 million records. It contains some 
basic sales information (such as product, customer data). I want to use the 
apriori function from the arules package on it: ruleset - apriori(sales, 
parameter=list(support=0.0005, confidence=0.1, minlen=2))It gives me 780379 
rules. I want to have that much rules on purpose, so hence the parameters being 
so low (I guess you can reproduce this problem with any large dataset and low 
settings for support and confidence). But then I want to check out the rules 
with inspect. It has a subset because I'm only interested in rules with the 
attribute Product in the rhs. inspect(subset(ruleset, subset=rhs %pin% 
Product=))Then this error occurs: Error in inspect(subset(sales3ruleset, 
subset = rhs %pin% Product=)) : 
  error in evaluating the argument 'x' in selecting a method for function 
'inspect': Error in .Call(R_or_ngCMatrix, x@data, y@data, PACKAGE = arules) 
: 
  negative length vectors are not allowedI looked around and apparently that 
part about negative length vectors are not allowed means that you want to 
create a vector that is larger than 2^31. How can you get around this limit? Or 
how can I make the inspectfunction work in this case?Thanks in advance!Kim  
 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Negative length vectors are not allowed error

2015-01-20 Thread Kim C.

brIf the email is still unreadable, please look at the attachment 
email.txt.nbsp;brDear Martin,brbrThanks for your fast reply. Sorry about 
the formatting of the previous email. I actually tested it before sending and 
it came fine when sending to my own account but when I send it to the r-help 
email address it seemed like the formatting got lost.nbsp;brbrI can 
actually send you the project of R I am working in because for this experiment 
I am only using the Contoso DWH testdata from Microsoft.nbsp;brThe R project 
is too big to include in the attachment, so you can download it from my dropbox 
via this link:nbsp;bra 
href=https://www.dropbox.com/s/lry1v7ka2env5xf/Example.RData?dl=0; 
target=_blankhttps://www.dropbox.com/s/lry1v7ka2env5xf/Example.RData?dl=0/abrbrI
 see I a made a small mistake in my description, so I will rewrite it shortly 
in steps:nbsp;br1.nbsp;ruleset lt;- apriori(sales, 
parameter=list(support=0.0005, confidence=0.1, minlen=2)) This gives 780379 
rules.nbsp;br2.nbsp;inspect(subset(ruleset, subset=rhs %pin% 
Product=))br3. just inspect(ruleset) does work. When finishing displaying 
nbsp;the 780379 rules, it also show me this and this message:nbsp;brWarning 
message:brclosing unused RODBC handle 1nbsp;brbrIf I understand the last 
part correctly, I should add Michael Hahsler's email address to the BCC. 
Otherwise, my apologies.nbsp;brbrThanks 
again,nbsp;brbrKimnbsp;brbrgt;
 From: maech...@stat.math.ethz.chbrgt; Date: Tue, 20 Jan 2015 10:30:34 
+0100brgt; To: minorthre...@hotmail.combrgt; CC: r-help@r-project.org; 
maech...@stat.math.ethz.chbrgt; Subject: Re: [R] Negative length vectors 
are not allowed errorbrgt;brgt;brgt; gt; Hi all, I have a question 
concerning an error that occurs when using the inspect() function on a set of 
rules made by the apriori function from the arules package. I have a dataset 
with 12 million records. It contains some basic sales information (such as 
product, customer data). I want to use the apriori function from the arules 
package on it: ruleset lt;- apriori(sales, parameter=list(support=0.0005, 
confidence=0.1, minlen=2))It gives me 780379 rules. I want to have that much 
rules on purpose, so hence the parameters being so low (I guess you can 
reproduce this problem with any large dataset and low settings for support and 
confidence). But then I want to check out the rules with inspect. It has a 
subset because I'm only interested in rules with the attribute Product in the 
rhs. inspect(subset(ruleset, subset=rhs %pin% Product=))Then this error 
occurs: Error in inspect(subset(sales3ruleset, subset = rhs %pin% Product=)) 
:brgt; gt; error in evaluating the argument 'x' in selecting a method for 
function 'inspect': Error in .Call(R_or_ngCMatrix, x@data, y@data, PACKAGE = 
arules) :brgt; gt; negative length vectors are not allowedI looked around 
and apparently that part about negative length vectors are not allowed means 
that you want to create a vector that is larger than 2^31. How can you get 
around this limit? Or how can I make the inspectfunction work in this 
case?Thanks in advance!Kimbrgt; Dear Kim,brgt;brgt; if you learned to 
post (i.e. write that e-mail) in plain text,brgt; the above would look more 
humane..brgt;brgt; Still, I was able to decipher it and you are right in 
thatbrgt; you hit a limitation of the current setup which may wellbrgt; 
be linked to the Matrix package which I maintain, and on whichbrgt; 'arules' 
depends.brgt;brgt; Can you please try to find a reproducible example 
[with randomlybrgt; generated data; i.e., you'd use set.seed(), runif(), 
rpois(),brgt; rmultinom(), rnorm(), ...] so we,brgt; the maintainer of 
'arules' Michael Hahsler (BCC'ed: usebrgt; maintainer(arules) to find such 
an e-mail address),brgt; and myself can look if and how that limitation 
might be lifted.brgt;brgt; Best regards,brgt; Martin Maechler, ETH 
Zurichbr 
Dear Martin,

Thanks for your fast reply. Sorry about the formatting of the previous email. I 
actually tested it before sending and it came fine when sending to my own 
account but when I send it to the r-help email address it seemed like the 
formatting got lost. 

I can actually send you the project of R I am working in because for this 
experiment I am only using the Contoso DWH testdata from Microsoft. 
The R project is too big to include in the attachment, so you can download it 
from my dropbox via this link: 
https://www.dropbox.com/s/lry1v7ka2env5xf/Example.RData?dl=0

I see I a made a small mistake in my description, so I will rewrite it shortly 
in steps: 
1. ruleset - apriori(sales, parameter=list(support=0.0005, confidence=0.1, 
minlen=2)) This gives 780379 rules. 
2. inspect(subset(ruleset, subset=rhs %pin% Product=))
3. just inspect(ruleset) does work. When finishing displaying  the 780379 
rules, it also show me this message: 
Warning message:
closing unused RODBC handle 1 

If I

Re: [R] Negative length vectors are not allowed error

2015-01-20 Thread Kim C.

Additionally,  maybe it's interesting to note that I remember that I first also 
tried the apriori function but with support =  0.001 (the rest remains the 
same). It gave me the negative length vector error as well but then I thought 
to update the packages and it seemed like it worked because the inspect with 
subset gave the results. But with 0.0005 it still gives me that error. 


Dear Martin,
Thanks for your fast reply. Sorry about the formatting of the previous email. I 
actually tested it before sending and it came fine when sending to my own 
account but when I send it to the r-help email address it seemed like the 
formatting got lost. 
I can actually send you the project of R I am working in because for this 
experiment I am only using the Contoso DWH testdata from Microsoft. The R 
project is too big to include in the attachment, so you can download it from my 
dropbox via this link: 
https://www.dropbox.com/s/lry1v7ka2env5xf/Example.RData?dl=0
I see I a made a small mistake in my description, so I will rewrite it shortly 
in steps: 1. ruleset - apriori(sales, parameter=list(support=0.0005, 
confidence=0.1, minlen=2)) This gives 780379 rules. 2. inspect(subset(ruleset, 
subset=rhs %pin% Product=))3. just inspect(ruleset) does work. When finishing 
displaying  the 780379 rules, it also show me this message: Warning 
message:closing unused RODBC handle 1 
If I understand the last part correctly, I should add Michael Hahsler's email 
address to the BCC. Otherwise, my apologies. 
Thanks again, 
Kim 
From: minorthre...@hotmail.com
To: maech...@stat.math.ethz.ch
Date: Tue, 20 Jan 2015 11:27:44 +0100
CC: r-help@r-project.org; mhahs...@lyle.smu.edu
Subject: Re: [R] Negative length vectors are not allowed error

brIf the email is still unreadable, please look at the attachment 
email.txt.nbsp;brDear Martin,brbrThanks for your fast reply. Sorry about 
the formatting of the previous email. I actually tested it before sending and 
it came fine when sending to my own account but when I send it to the r-help 
email address it seemed like the formatting got lost.nbsp;brbrI can 
actually send you the project of R I am working in because for this experiment 
I am only using the Contoso DWH testdata from Microsoft.nbsp;brThe R project 
is too big to include in the attachment, so you can download it from my dropbox 
via this link:nbsp;bra 
href=https://www.dropbox.com/s/lry1v7ka2env5xf/Example.RData?dl=0; 
target=_blankhttps://www.dropbox.com/s/lry1v7ka2env5xf/Example.RData?dl=0/abrbrI
 see I a made a small mistake in my description, so I will rewrite it shortly 
in steps:nbsp;br1.nbsp;ruleset lt;- apriori(sales, 
parameter=list(support=0.0005, confidence=0.1, minlen=2)!
 ) This gives 780379 rules.nbsp;br2.nbsp;inspect(subset(ruleset, subset=rhs 
%pin% Product=))br3. just inspect(ruleset) does work. When finishing 
displaying nbsp;the 780379 rules, it also show me this and this 
message:nbsp;brWarning message:brclosing unused RODBC handle 
1nbsp;brbrIf I understand the last part correctly, I should add Michael 
Hahsler's email address to the BCC. Otherwise, my 
apologies.nbsp;brbrThanks 
again,nbsp;brbrKimnbsp;brbrgt;
 From: maech...@stat.math.ethz.chbrgt; Date: Tue, 20 Jan 2015 10:30:34 
+0100brgt; To: minorthre...@hotmail.combrgt; CC: r-help@r-project.org; 
maech...@stat.math.ethz.chbrgt; Subject: Re: [R] Negative length vectors 
are not allowed errorbrgt;brgt;brgt; gt; Hi all, I have a question 
concerning an error that occurs when using the inspect() function on a set of 
rules made by the apriori function from the arules package. I have a dataset 
with !
 12 million records. It contains some basic sales information (such as 
product, customer data). I want to use the apriori function from the arules 
package on it: ruleset lt;- apriori(sales, parameter=list(support=0.0005, 
confidence=0.1, minlen=2))It gives me 780379 rules. I want to have that much 
rules on purpose, so hence the parameters being so low (I guess you can 
reproduce this problem with any large dataset and low settings for support and 
confidence). But then I want to check out the rules with inspect. It has a 
subset because I'm only interested in rules with the attribute Product in the 
rhs. inspect(subset(ruleset, subset=rhs %pin% Product=))Then this error 
occurs: Error in inspect(subset(sales3ruleset, subset = rhs %pin% Product=)) 
:brgt; gt; error in evaluating the argument 'x' in selecting a method for 
function 'inspect': Error in .Call(R_or_ngCMatrix, x@data, y@data, PACKAGE = 
arules) :brgt; gt; negative length vectors are not allowedI looked around 
and apparently that part about negative length vectors are not all!
 owed means that you want to create a vector that is larger than 2^31. How can 
you get around this limit? Or how can I make the inspectfunction work in this 
case?Thanks in advance!Kimbrgt; Dear Kim,brgt;brgt; if you learned to 
post (i.e. write that e-mail) in plain text,brgt; the

Re: [R] How to get items for both LHS and RHS for only specific columns in arules?‏

2015-01-16 Thread Kim C.

Sorry, I see that the formatting of the e-mail went all wrong and was 
completely unreadable. you can find a readable version in the attachment and 
down below (if it will work this time).
--
Hi all, 

I have a question about the arules package in R. I hope the example tables are 
readable in your email, otherwise you can view it in the question.txt in the 
attachment.

Within the apriori function in the arules package, I want the outcome to only 
contain these two variables in the LHS HouseOwnerFlag=0 and HouseOwnerFlag=1. 
The RHS should only contain attributes from the column Product. For instance:

   lhs  rhs 
   
1 {HouseOwnerFlag=0} = {Product=SV 16xDVD M360 Black}  
2 {HouseOwnerFlag=1} = {Product=Adventure Works 26 720p}  
3 {HouseOwnerFlag=0} = {Product=Litware Wall Lamp E3015 Silver}
4 {HouseOwnerFlag=1} = {Product=Contoso Coffee Maker 5C E0900} 

So now I use the following: 
rules - apriori(sales, parameter=list(support =0.01, confidence =0.8, 
minlen=2), appearance = list(lhs=c(HouseOwnerFlag=0, HouseOwnerFlag=1)))

Then I use this to ensure that only the Product column is on the RHS: 
inspect( subset( rules, subset = rhs %pin% Product= ) )

The outcome is like this (for the sake of readability, I omitted the colomns 
for support, lift, confidence):
lhs 
 rhs 
1 {ProductKey=153, IncomeGroup=Moderate, BrandName=Adventure Works }   = 
{Product=SV 16xDVD M360 Black} 
2 {ProductKey=176, MaritalStatus=M, ProductCategoryName=TV and Video }  = 
{Product=Adventure Works 26 720p} 
3 {BrandName=Southridge Video, NumberChildrenAtHome=0 } 
  = {Product=Litware Wall Lamp E3015 Silver} 
4 {HouseOwnerFlag=1, BrandName=Southridge Video, ProductKey=170 } 
= {Product=Contoso Coffee Maker 5C E0900} 

So apparently the LHS is able to contain every possible column, not just 
HouseOwnerFlag like I specified.  I see that I can put default=rhs in the 
apriori function to prevent this, like so: 
rules - apriori(sales, parameter=list(support =0.001, confidence =0.5, 
minlen=2), appearance = list(lhs=c(HouseOwnerFlag=0, HouseOwnerFlag=1), 
default=rhs)) 

Then upon inspecting (without the subset part, just inspect(rules), there are 
far less rules (7) than before but it does indeed only contain
HouseOwnerFlag in the LHS:

lhs  rhs 
1 {HouseOwnerFlag=0} = {MaritalStatus=S}   
2 {HouseOwnerFlag=1} = {Gender=M}  
3 {HouseOwnerFlag=0} = {NumberChildrenAtHome=0}  
4 {HouseOwnerFlag=1} = {Gender=M}

However on the RHS there's nothing from the column Product in the RHS. So it 
has no use to inspect it with subset as ofcourse it would return null. I tested 
it several times with different support numbers to experiment and see if 
Product would appear or not, but the 7 same rules remain the same.

So my question is, how can I specify both the LHS (HouseOwnerFlag) and RHS 
(Product)? What am I doing wrong?

You can reproduce this problem by downloading this testdataset from the 
attachment or via this link:
https://www.dropbox.com/s/tax5xalac5xgxtf/testdf.txt?dl=0 
Mind you, I only took the first 20 rows from a huge dataset (12 million), so 
the output here won't have the same product names as the example I displayed 
above. But the problem still remains the same. (if you would like to have the 
entire dataset I can email it ofcourse). I want to be able to get only 
HouseOwnerFlag=0 and/or HouseOwnerFlag=1 on the LHS and the column Product on 
the RHS. 

I asked this question on other forum before, but no response at all 
unfortunately. Since this mailinglist is dedicated to R only I thought you guys 
might be able to help me. 

Thanks in advance! I look forward to hear from you.

Kim

From: minorthre...@hotmail.com
To: r-help@r-project.org
Date: Thu, 15 Jan 2015 13:50:54 +0100
Subject: [R] How to get items for both LHS and RHS for only specific columns in 
arules?‏

Hi all, I have a question about the arules package in R. I hope the example 
tables are readable in your email, otherwise you can view it in the 
question.txt in the attachment.Within the apriori function in the arules 
package, I want the outcome to only contain these two variables in the LHS 
HouseOwnerFlag=0 and HouseOwnerFlag=1. The RHS should only contain attributes 
from the column Product. For instance: lhs rhs support confidence lift1 
{HouseOwnerFlag=0} = {Product=SV 16xDVD M360 Black} 0.250 0.250 
1.002 {HouseOwnerFlag=1} = {Product=Adventure Works 26 720p} 0.250 
0.250 1.003 {HouseOwnerFlag=0} = {Product=Litware Wall Lamp E3015 
Silver}0.167 0.333 1.334 {HouseOwnerFlag=1} =

[R] How to get items for both LHS and RHS for only specific columns in arules?‏

2015-01-15 Thread Kim C.

Hi all, I have a question about the arules package in R. I hope the example 
tables are readable in your email, otherwise you can view it in the 
question.txt in the attachment.Within the apriori function in the arules 
package, I want the outcome to only contain these two variables in the LHS 
HouseOwnerFlag=0 and HouseOwnerFlag=1. The RHS should only contain attributes 
from the column Product. For instance:   lhs   rhs  
  support   confidence lift1 {HouseOwnerFlag=0} = 
{Product=SV 16xDVD M360 Black}  0.250 0.250 1.002 
{HouseOwnerFlag=1} = {Product=Adventure Works 26 720p}  0.250 
0.250 1.003 {HouseOwnerFlag=0} = {Product=Litware Wall Lamp E3015 
Silver}0.167 0.333 1.334 {HouseOwnerFlag=1} = {Product=Contoso 
Coffee Maker 5C E0900} 0.167 0.333 1.33So now I use the following: 
rules - apriori(sales, parameter=list(support =0.01, confidence =0.8, 
minlen=2), appearance = list(lhs=c(HouseOwnerFlag=0, 
HouseOwnerFlag=1)))Then I use this to ensure that only the Product column is 
on the RHS: inspect( subset( rules, subset = rhs %pin% Product= ) )The 
outcome is like this (for the sake of readability, I omitted the colomns for 
support, lift, confidence):lhs  
  rhs 1 {ProductKey=153, IncomeGroup=Moderate, 
BrandName=Adventure Works }   = {Product=SV 16xDVD M360 Black} 2 
{ProductKey=176, MaritalStatus=M, ProductCategoryName=TV and Video } = 
{Product=Adventure Works 26 720p} 3 {BrandName=Southridge Video, 
NumberChildrenAtHome=0 }= {Product=Litware Wall Lamp E3015 
Silver} 4 {HouseOwnerFlag=1, BrandName=Southridge Video, ProductKey=170 }  
= {Product=Contoso Coffee Maker 5C E0900} So apparently the LHS is able to 
contain every possible column, not just HouseOwnerFlag like I specified.  I see 
that I can put default=rhs in the apriori function to prevent this, like so: 
rules - apriori(sales, parameter=list(support =0.001, confidence =0.5, 
minlen=2), appearance = list(lhs=c(HouseOwnerFlag=0, HouseOwnerFlag=1), 
default=rhs)) Then upon inspecting (without the subset part, just 
inspect(rules), there are far less rules (7) than before but it does indeed 
only containHouseOwnerFlag in the LHS:lhs  rhs  
   support  confidence lift1 {HouseOwnerFlag=0} = 
{MaritalStatus=S}0.250 0.250 1.002 
{HouseOwnerFlag=1} = {Gender=M}   0.250 0.250 
1.003 {HouseOwnerFlag=0} = {NumberChildrenAtHome=0} 0.167 
0.333 1.334 {HouseOwnerFlag=1} = {Gender=M}   
0.167 0.333 1.33However on the RHS there's nothing from the column 
Product in the RHS. So it has no use to inspect it with subset as ofcourse it 
would return null. I tested it several times with different support numbers to 
experiment and see if Product would appear or not, but the 7 same rules remain 
the same.So my question is, how can I specify both the LHS (HouseOwnerFlag) and 
RHS (Product)? What am I doing wrong?You can reproduce this problem by 
downloading this testdataset from the attachment (testdf.txt) or via this 
link:https://www.dropbox.com/s/tax5xalac5xgxtf/testdf.txt?dl=0 Mind you, I only 
took the first 20 rows from a huge dataset (12 million), so the output here 
won't have the same product names as the example I displayed above. But the 
problem still remains the same. (if you would like to have the entire dataset I 
can email it ofcourse). I want to be able to get only HouseOwnerFlag=0 and/or 
HouseOwnerFlag=1 on the LHS and the column Product on the RHS. I asked this 
question on other forum before, but no response at all unfortunately. Since 
this mailinglist is dedicated to R only I thought you guys might be able to 
help me. Thanks in advance! I look forward to hear from you.Kim 
sales - structure(list(ProductCategoryName = structure(c(6L, 6L, 2L, 
 2L, 2L, 7L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 
 2L), .Label = c(Audio, 
Cameras and camcorders , Cell phones, 
 Computers, 
Games and Toys, Home Appliances, Music, Movies and Audio Books, 
 TV and 
Video), class = factor), ProductSubcategory = structure(c(26L, 

  26L, 11L, 12L, 12L, 21L, 
27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 

  27L, 12L, 12L, 12L, 12L, 
12L), .Label = c(Air Conditioners,

[R] Save rules with asRules() from Rattle into dataframe

[R] How to read this Rpart decision tree?

[R] xerror and xstd are missing from cptable of the Rpart package

[R] Decision trees take long or throws memory limit message

[R] Negative length vectors are not allowed error

Re: [R] Negative length vectors are not allowed error

Re: [R] Negative length vectors are not allowed error

Re: [R] How to get items for both LHS and RHS for only specific columns in arules?‏

[R] How to get items for both LHS and RHS for only specific columns in arules?‏

9 matches

Site Navigation

Mail list logo

Footer information