Re: [R] Adding Year-Month-Day to X axis

2018-05-05 Thread Gregory Coats
Jim,
That you very much!
How do I instruct staxlab to label once every n days, rather than labeling 
every day?
Greg

> On May 5, 2018, at 6:50 PM, Jim Lemon  wrote:
> 
> staxlab(1,at=x_mmdd,labels=format(x_mmdd,"%Y-%m-%d"))


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why the length and width of a plot region produced by the dev.new() function cannot be correctly set?

2018-05-05 Thread sunyeping via R-help

--From:Duncan 
Murdoch Send Time:2018 May 6 (Sun) 04:58To:孙业平 
; David Winsemius Cc:R Help 
Mailing List Subject:Re: [R] why the length and width of 
a plot region produced by the dev.new() function cannot be correctly set?
On 05/05/2018 11:33 AM, 孙业平 wrote:
> 
> --
> From:Duncan Murdoch 
> Send Time:2018 May 4 (Fri) 17:24
> To:孙业平 ; David Winsemius 
> Cc:R Help Mailing List 
> Subject:Re: [R] why the length and width of a plot region produced by 
> the dev.new() function cannot be correctly set?
> 
> On 04/05/2018 3:04 AM, sunyeping via R-help wrote:
>  >
>  > 
>--From:David 
>Winsemius Send Time:2018 May 4 (Fri) 13:25To:孙业平 
>Cc:R Help Mailing List Subject:Re: 
>[R] why the length and width of a plot region produced by the dev.new() 
>function cannot be correctly set?
>  >
>  >>   On May 3, 2018, at 6:28 PM, sunyeping via R-help  
>wrote:
>  >>
>  >>   When I check the size of the plot region usingdev.size("in")a new plot 
>region is produced and in the Rconsole I get[1] 5.33 5.322917
>  >
>  > Your test is all mangleed together. You failed in your duty to read the 
>list info and the Posting guide . NO HTML!
>  >
>  >>   If I mean to produce a plot region with size setting 
>bydev.new(length=3,width=3)a plot region is produced, but the size is 
>[2.281250, 5.322917], as detected by the de.size function. If I 
>type:dev.new(length=10,width=10)I get a plot region of with the size of 
>[7.614583, 5.322917]. It seems that the width of the new plot region cannot be 
>set, and tt is always 5.322917. The length of the new plot region can be set, 
>but it is always smaller that the values I set.What do I miss? What is the 
>correct way of setting the dimension of the new plot region? I will be 
>grateful to any help.Best regards,
>  >
>  > The size of the device is not the size of the plot region. You need to 
>take into account the margins. See ?par
>  > Thank you, David.I have read the par() document. Clearly the size of the 
>plot region is smaller than or equal to the divice size. However, if I produce 
>a graphic device with dev.new (length, width) or other functions, I find the 
>largest  width of the new device is always 5.3 inches whatever the values I 
>set, and the length of it is alway smaller than what I set.
> 
> The length and width aren't the first and second parameters for any
> device, and length isn't a parameter at all.  Try
> 
> dev.new(height = 10, width = 10)
> 
> and you should get a bigger device if it will fit on your screen.  If it
> won't fit, then you might get a smaller one, and you'll need to choose a
> non-screen device such as png() or pdf() instead of the default device.
> 
> Duncan Murdoch
> 
>Could you tell me how to produce a graphic divice with correct size
> that I set? I need this function because the graphic divice cannot
> accomendate all of the graph I make with some of plot tools such as
> ggtree. In ggtree plot, part of the tree tips label are invisible
> (https://www.dropbox.com/s/87gyusx7ay1xxu8/tree.pdf?dl=0) even I set
> "par(mar=rep(0,4))". So I think I must plot the tree on a larger graphic
> device.  Best regards.
>  >
>  >
>  >>
>  >>[[alternative HTML version deleted]]
>  >>
>  >>   __
>  >>   R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>  >> https://stat.ethz.ch/mailman/listinfo/r-help
>  >>   PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
>  >>   and provide commented, minimal, self-contained, reproducible code.
>  >
>  > David Winsemius
>  > Alameda, CA, USA
>  >
>  > 'Any technology distinguishable from magic is insufficiently advanced.'   
>-Gehm's Corollary to Clarke's Third Law
>  >
>  >
>  >
>  >
>  >
>  >  [[alternative HTML version deleted]]
>  >
>  > __
>  > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>  > https://stat.ethz.ch/mailman/listinfo/r-help
>  > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
>  > and provide commented, minimal, self-contained, reproducible code.
>  >
> 
> "dev.new(height = 10, width = 10) " doesn't work neither. It produces a 
> device with a size of [ 5.760417, 5.75]. My computer is a usual 14 
> inch thankpad labtop. Is 5 ~ inches really the up limit of the size of 
> the R graphic device in computer screen? I doubt it.

You ask questions in a very rude way.  I'm going to let you figure this 
one out by yourself.

Duncan Murdoch
Sorry, Professor. 

Re: [R] Adding Year-Month-Day to X axis

2018-05-05 Thread Jim Lemon
Hi Greg,
The only reason I included the staxlab function in the plotrix library
was to fit all the dates onto the axis. If you want to try it:

install.packages("plotrix")

Jim


On Sun, May 6, 2018 at 9:02 AM, Gregory Coats  wrote:
> Jim, Thanks for responding!
> I am using the official R 3.5.0 for Mac OS X.
> This apparently does not include library (plotrix)
>
> library(plotrix)
> Error in library(plotrix) : there is no package called ‘plotrix’
>
> Greg
>
> On May 5, 2018, at 6:50 PM, Jim Lemon  wrote:
>
> Hi Greg,
> What you are getting there is a factor, interpreted as a 1:n sequence
> based on the sort order of your "dates". Here's a way to get dates on
> your x-axis in the format you want:
>
> x_mmdd<-as.Date(c("2018-04-25","2018-04-26","2018-04-27",
> "2018-04-28","2018-04-29","2018-04-30","2018-05-01","2018-05-02",
> "2018-05-03","2018-05-04","2018-05-05"),format="%Y-%m-%d")
> plot(x_mmdd, y_duration, type="l",xaxt="n")
> library(plotrix)
> staxlab(1,at=x_mmdd,labels=format(x_mmdd,"%Y-%m-%d"))
>
> Jim
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding Year-Month-Day to X axis

2018-05-05 Thread Bert Gunter
"Apparently, R does not understand my Year-Month-Day "

I think, rather, you need to learn how R handles dates and times.

See here to begin, perhaps:
?DateTimeClasses

There are many R resources for dealing with data over time, many of which
are listed here, and others might be found by online searching.
https://cran.r-project.org/web/views/TimeSeries.html

There are also many tutorials on dealing with time data in R. Even a
cursory web search should find many.

... and of course someone may respond directly to your query here (but not
me, as I'm not that knowledgeable).

Cheers,
Bert




Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sat, May 5, 2018 at 11:14 AM, Gregory Coats  wrote:

> I am using R 3.5.0 for Mac OS X.
> Issuing these two commands yields the expected plot.
> y_duration <- c (301.59050,  387.35700,  365.64366,  317.26150,
> 321.71883,  342.44950,  318.95350,  322.33233,  330.60333,  428.99516,
> 297.82066)
> plot (y_duration, type="l”)
>
> Adding Year-Month-Day values for the x axis, and then calling plot (x,y),
> yields a bizarre plot. Apparently, R does not understand my Year-Month-Day
> values.
> x_mmdd <- c (2018-04-25, 2018-04-26, 2018-04-27, 2018-04-28,
> 2018-04-29, 2018-04-30, 2018-05-01, 2018-05-02, 2018-05-03, 2018-05-04,
> 2018-05-05)
> plot (x_mmdd, y_duration, type="l")
>
> I would be enormously appreciative of your guidance.
> Greg Coats
> Virginia, USA
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding Year-Month-Day to X axis

2018-05-05 Thread Jim Lemon
Hi Greg,
What you are getting there is a factor, interpreted as a 1:n sequence
based on the sort order of your "dates". Here's a way to get dates on
your x-axis in the format you want:

x_mmdd<-as.Date(c("2018-04-25","2018-04-26","2018-04-27",
 "2018-04-28","2018-04-29","2018-04-30","2018-05-01","2018-05-02",
 "2018-05-03","2018-05-04","2018-05-05"),format="%Y-%m-%d")
plot(x_mmdd, y_duration, type="l",xaxt="n")
library(plotrix)
staxlab(1,at=x_mmdd,labels=format(x_mmdd,"%Y-%m-%d"))

Jim

On Sun, May 6, 2018 at 4:14 AM, Gregory Coats  wrote:
> I am using R 3.5.0 for Mac OS X.
> Issuing these two commands yields the expected plot.
> y_duration <- c (301.59050,  387.35700,  365.64366,  317.26150,  321.71883,  
> 342.44950,  318.95350,  322.33233,  330.60333,  428.99516,  297.82066)
> plot (y_duration, type="l”)
>
> Adding Year-Month-Day values for the x axis, and then calling plot (x,y), 
> yields a bizarre plot. Apparently, R does not understand my Year-Month-Day 
> values.
> x_mmdd <- c (2018-04-25, 2018-04-26, 2018-04-27, 2018-04-28, 2018-04-29, 
> 2018-04-30, 2018-05-01, 2018-05-02, 2018-05-03, 2018-05-04, 2018-05-05)
> plot (x_mmdd, y_duration, type="l")
>
> I would be enormously appreciative of your guidance.
> Greg Coats
> Virginia, USA
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adding Year-Month-Day to X axis

2018-05-05 Thread Gregory Coats
I am using R 3.5.0 for Mac OS X.
Issuing these two commands yields the expected plot.
y_duration <- c (301.59050,  387.35700,  365.64366,  317.26150,  321.71883,  
342.44950,  318.95350,  322.33233,  330.60333,  428.99516,  297.82066)
plot (y_duration, type="l”)

Adding Year-Month-Day values for the x axis, and then calling plot (x,y), 
yields a bizarre plot. Apparently, R does not understand my Year-Month-Day 
values.
x_mmdd <- c (2018-04-25, 2018-04-26, 2018-04-27, 2018-04-28, 2018-04-29, 
2018-04-30, 2018-05-01, 2018-05-02, 2018-05-03, 2018-05-04, 2018-05-05)
plot (x_mmdd, y_duration, type="l")

I would be enormously appreciative of your guidance.
Greg Coats
Virginia, USA
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Discovering patterns in textual strings

2018-05-05 Thread Bert Gunter
Jeff:

The previous solution I sent you was hugely inefficient and frankly kind of
stupid. Here is a much better and simpler solution.

> z <- c("abc",
   "abc_def",
   "abc.def",
   "abc def",
   "abcd_ef",
   "abcd",
   "e","f")

## Create vector of patterns of same length as z, many of which are repeated
> pats <- sub("^(.+)[. _].*","\\1",z)

## Now can use tapply() to get indices if desired
## Note that the patterns label the groups

> tapply(seq_along(z),pats,I)
$abc
[1] 1 2 3 4

$abcd
[1] 5 6

$e
[1] 7

$f
[1] 8

No need to reply.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sat, May 5, 2018 at 12:14 AM, Bert Gunter  wrote:

> "Does that help?"
>
> No. I am not your private consultant. You need to reply to the list, which
> I have cc'ed here, not just me.
>
> I am still somewhat confused by your specifications, but others may not
> be. Part of my confusion stems from your failure to provide a reproducible
> example (see e.g. the posting guide linked below).  For example, I cannot
> tell from your text whether the Abc and Bce strings contain one or more
> spaces at the end. I shall assume they may but need not.
>
> Anyway, here is a reproducible example and solution that assumes that the
> substrings/patterns of interest to you occur at the beginning of the
> strings and may or may not be followed by one of "." "_" or " "(space) and
> then possibly further text which should be ignored. Assuming that you are
> familiar with regular expressions, maybe this will help to get you started
> even if I have misunderstood your specifications. If you aren't familiar
> with regex's, maybe the stringr package may provide a gentler interface
> than using R's raw regex functionality. Or maybe someone else can suggest a
> better approach (which is another reason why you should reply to the list,
> not just me).
>
> z <- c("abc",
>"abc_def",
>"abc.def",
>"abc def",
>"abcd_ef",
>"abcd",
>"e","f")
>
> pats <- unique(sub("^(.+)[. _]+.*", "\\1", z))
> ## gives:
> > pats
> [1] "abc"  "abcd" "e""f"
>
>
> This gives you the four separate patterns that you could then use to group
> your records, perhaps by:
>
> > lapply(pats,function(x)grep(paste0("^", x,"([_. ]|$)"), z))
> [[1]]
> [1] 1 2 3 4
>
> [[2]]
> [1] 5 6
>
> [[3]]
> [1] 7
>
> [[4]]
> [1] 8
>
> That is, indices 1-4 in z are the first group; 5 and 6 are the second; etc.
>
>
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Fri, May 4, 2018 at 9:00 PM, Jeff Reichman 
> wrote:
>
>> Bert
>>
>> Thank you for the  link.  Figured there might be something
>>
>> Regarding your questions
>>
>> This is from a large 53 Billion records.  The column in question are
>> AdNames (Real Time Bidding data)
>>
>> #1. Generally yes, but not always
>>
>> #2 Separators could be underscores  (_) or dots (.) as in 1.2.3_ABC ..
>>
>> #3 Yes. So there could be Abc 123 could be a matching string
>>
>> This would not be considered a match  ...
>> abc_something
>> this.is_a long stringwithabcinthemiddle
>>
>> The sequence(s) are always are at the beginning (or so it appears).  Out
>> of the 54 billion records  I am able to pull (SparkR sql) 948,679 unique
>> strings.  It is from these unique strings that I (if possible)  want to
>> identify the "key" strings.
>>
>> 1.  Abc_1232.niok7j9hd
>> 2.  Abc
>> 3.  Abc.2#348hfk2.njilo
>> 4.  Abc.2
>> 5.  Abc.7
>> 6.  BAdfr_kajdhf98#kjsdh
>> 7.  BAdrf_gofer
>> 948679 
>>
>>
>> So I may have a thousand individuals strings all of which have Abc as a
>> common string, or Badrf.  So I am looking to pull "Abc," "BAdrf", etc.  So
>> then I can go back and restructure the data to show that any record with
>> Abc_1232.niok7j9hd if part of the Abc "Group," or Family ???
>>
>> Does that help
>>
>> Jeff
>>
>> -Original Message-
>> From: Bert Gunter 
>> Sent: Friday, May 4, 2018 5:41 PM
>> To: reichm...@sbcglobal.net
>> Cc: R-help 
>> Subject: Re: [R] Discovering patterns in textual strings
>>
>> The answer is, of course, using regular expressions and/or libraries
>> therefor. However, I do not think you have defined your problem
>> sufficiently. Some questions I have:
>>
>> 1. Do possible patterns to be matched always appear at the beginning of
>> your strings?
>>
>> 2. Always together between specified separators ("_"  in your example);
>> or one of several specified separators; or otherwise?
>>
>> 3. Do spaces or other nonprinting characters occur in your strings?
>>
>> e.g. would
>>
>> abc_something
>> this.is_a long stringwithabcinthemiddle
>>
>> be considered 

Re: [R] why the length and width of a plot region produced by the dev.new() function cannot be correctly set?

2018-05-05 Thread Duncan Murdoch

On 05/05/2018 11:33 AM, 孙业平 wrote:


--
From:Duncan Murdoch 
Send Time:2018 May 4 (Fri) 17:24
To:孙业平 ; David Winsemius 
Cc:R Help Mailing List 
Subject:Re: [R] why the length and width of a plot region produced by 
the dev.new() function cannot be correctly set?


On 04/05/2018 3:04 AM, sunyeping via R-help wrote:
 >
 > --From:David Winsemius 
Send Time:2018 May 4 (Fri) 13:25To:孙业平 
Cc:R Help Mailing List Subject:Re: [R] why 
the length and width of a plot region produced by the dev.new() function cannot be correctly set?
 >
 >>   On May 3, 2018, at 6:28 PM, sunyeping via R-help  
wrote:
 >>
 >>   When I check the size of the plot region usingdev.size("in")a new plot 
region is produced and in the Rconsole I get[1] 5.33 5.322917
 >
 > Your test is all mangleed together. You failed in your duty to read the list 
info and the Posting guide . NO HTML!
 >
 >>   If I mean to produce a plot region with size setting 
bydev.new(length=3,width=3)a plot region is produced, but the size is [2.281250, 
5.322917], as detected by the de.size function. If I 
type:dev.new(length=10,width=10)I get a plot region of with the size of [7.614583, 
5.322917]. It seems that the width of the new plot region cannot be set, and tt is 
always 5.322917. The length of the new plot region can be set, but it is always 
smaller that the values I set.What do I miss? What is the correct way of setting the 
dimension of the new plot region? I will be grateful to any help.Best regards,
 >
 > The size of the device is not the size of the plot region. You need to take 
into account the margins. See ?par
 > Thank you, David.I have read the par() document. Clearly the size of the 
plot region is smaller than or equal to the divice size. However, if I produce a 
graphic device with dev.new (length, width) or other functions, I find the largest 
 width of the new device is always 5.3 inches whatever the values I set, and the 
length of it is alway smaller than what I set.

The length and width aren't the first and second parameters for any
device, and length isn't a parameter at all.  Try

dev.new(height = 10, width = 10)

and you should get a bigger device if it will fit on your screen.  If it
won't fit, then you might get a smaller one, and you'll need to choose a
non-screen device such as png() or pdf() instead of the default device.

Duncan Murdoch

   Could you tell me how to produce a graphic divice with correct size
that I set? I need this function because the graphic divice cannot
accomendate all of the graph I make with some of plot tools such as
ggtree. In ggtree plot, part of the tree tips label are invisible
(https://www.dropbox.com/s/87gyusx7ay1xxu8/tree.pdf?dl=0) even I set
"par(mar=rep(0,4))". So I think I must plot the tree on a larger graphic
device.  Best regards.
 >
 >
 >>
 >>[[alternative HTML version deleted]]
 >>
 >>   __
 >>   R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 >> https://stat.ethz.ch/mailman/listinfo/r-help
 >>   PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

 >>   and provide commented, minimal, self-contained, reproducible code.
 >
 > David Winsemius
 > Alameda, CA, USA
 >
 > 'Any technology distinguishable from magic is insufficiently advanced.'   
-Gehm's Corollary to Clarke's Third Law
 >
 >
 >
 >
 >
 >  [[alternative HTML version deleted]]
 >
 > __
 > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

 > and provide commented, minimal, self-contained, reproducible code.
 >

"dev.new(height = 10, width = 10) " doesn't work neither. It produces a 
device with a size of [ 5.760417, 5.75]. My computer is a usual 14 
inch thankpad labtop. Is 5 ~ inches really the up limit of the size of 
the R graphic device in computer screen? I doubt it.


You ask questions in a very rude way.  I'm going to let you figure this 
one out by yourself.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Rd] source(echo = TRUE) with a iso-8859-1 encoded file gives an error

2018-05-05 Thread Scott Kostyshak
On Fri, May 04, 2018 at 10:58:26PM +, Ista Zahn wrote:
> On Fri, May 4, 2018 at 4:47 PM, Scott Kostyshak  wrote:
> > I have very little knowledge about file encodings and would like to
> > learn more.
> >
> > I've read the following pages to learn more:
> >
> >   
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__stat.ethz.ch_R-2Dmanual_R-2Ddevel_library_base_html_Encoding.html=DwIFaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=neJ42wVqpDzuvOKMBML6-HnbH0l0aXpb0ZUFWoGb-Bo=yaDPpePO4lxR7-PBircARZlFh-GVyi5sTNtjTr_JZ7U=PSqR5opjnHspAeM6Edm1ddsaY3ok1bnV-t6W4MKtVCM=
> >   
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_4806823_how-2Dto-2Ddetect-2Dthe-2Dright-2Dencoding-2Dfor-2Dread-2Dcsv=DwIFaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=neJ42wVqpDzuvOKMBML6-HnbH0l0aXpb0ZUFWoGb-Bo=yaDPpePO4lxR7-PBircARZlFh-GVyi5sTNtjTr_JZ7U=1M6pNfwFR5uG5DkSAHPpXZKYETCiwV1wsJxpew6lThY=
> >   
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.r-2Dproject.org_Encodings-5Fand-5FR.html=DwIFaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=neJ42wVqpDzuvOKMBML6-HnbH0l0aXpb0ZUFWoGb-Bo=yaDPpePO4lxR7-PBircARZlFh-GVyi5sTNtjTr_JZ7U=hAF57aL9khHQ_2Ndars7qMO-FoqxnnmOiEDIprsllko=
> >
> > The last one, in particular, has been very helpful. I would be
> > interested in any further references that you suggest.
> >
> > I attach a file that reproduces the issue I would like to learn more
> > about. I do not know if the file encoding will be correctly preserved
> > through email, so I also provide the file (temporarily) on Dropbox here:
> >
> >   
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_3lbgebk7b5uaia7_encoding-5Fexport-5Fissue.R-3Fdl-3D0=DwIFaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=neJ42wVqpDzuvOKMBML6-HnbH0l0aXpb0ZUFWoGb-Bo=yaDPpePO4lxR7-PBircARZlFh-GVyi5sTNtjTr_JZ7U=fGtYdB-U7ktXVFeniRudE-ZmxmCP3ZUfeLOvJ0AJwqs=
> >
> > The file gives an error when using "source()" with the
> > argument echo = TRUE:
> >
> >   > source("encoding_export_issue.R", echo = TRUE)
> >   Error in nchar(dep, "c") : invalid multibyte string, element 1
> >   In addition: Warning message:
> >   In grepl("^[[:blank:]]*$", dep[1L]) :
> > input string 1 is invalid in this locale
> >
> > The problem comes from the "á" character in the .R file. The file
> > appears to be encoded as "iso-8859-1":
> >
> >   $ file --mime-encoding encoding_export_issue.R
> >   encoding_export_issue.R: iso-8859-1
> >
> > Note that for me:
> >
> >   > getOption("encoding")
> >   [1] "native.enc"
> >
> > so "native.enc" is used for the "encoding" argument of source().
> >
> > The following two calls succeed:
> >
> >   > source("encoding_export_issue.R", echo = TRUE, encoding = "unknown")
> >   > source("encoding_export_issue.R", echo = TRUE, encoding = "iso-8859-1")
> >
> > Is this file a valid "iso-8859-1" encoded file?
> 
> The one you attached is not. The one linked to in dropbox is.
> 
>  Why does source() fail
> > in the case of encoding set to "native.enc"? Is it because of the
> > settings to UTF-8 in my locale (see info on my system at the bottom of
> > this email).
> 
> Yes.
> 
> >
> > I'm guessing it would be a bad idea to put
> >
> >   options(encoding = "unknown")
> >
> > in my .Rprofile, because it is difficult to always correctly guess the
> > encoding of files?
> 
> My guess is that the issue is less about the difficulty of guessing
> the encoding, and more about the time it takes to do so. That's not
> particularly relevant for the "source" function, but the encoding
> option is used by many of the file IO functions in R and so has
> implications well beyond the behavior of "source".

Ah I did not think about this possibility. Makes sense.

> 
>  Is there a reason why setting it to "unknown" would
> > lead to more problems than leaving it set to "native.enc"?
> 
> It depends on what you are actually doing. If you are on a UTF-8
> locale and working exclusively with UTF-8 files, setting
> options(encoding = "unknown") will just slow down your file IO by
> checking for the encoding every time.

Good to know. Thank you for your response, Ista.

Scott


-- 
Scott Kostyshak
Assistant Professor of Economics
University of Florida
https://people.clas.ufl.edu/skostyshak/

> >
> > I've reproduced the above behavior on R-devel (r74677) and 3.4.3. Below
> > is my session info and locale info for my system with the 3.4.3 version:
> >
> >> sessionInfo()
> > R version 3.4.3 (2017-11-30)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Ubuntu 16.04.3 LTS
> >
> > Matrix products: default
> > BLAS: /usr/lib/libblas/libblas.so.3.6.0
> > LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
> >
> > locale:
> >  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> >  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> >  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
> >  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> >  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> > [11] 

Re: [R] adding overall constraint in optim()

2018-05-05 Thread Ravi Varadhan
Here is what you do for your problem:



require(BB)

Mo.vect <- as.vector(tail(head(mo,i),1))
 wgt.vect <- as.vector(tail(head(moWeightsMax,i),1))
 cov.mat <- cov(tail(head(morets,i+12),12))
 opt.fun <- function(wgt.vect) -sum(Mo.vect %*% wgt.vect) / (t(wgt.vect) 
%*% (cov.mat %*% wgt.vect))

 LowerBounds<-c(0.2,0.05,0.1,0,0,0)
 UpperBounds<-c(0.6,0.3,0.6,0.15,0.1,0.2)

  spgSolution <- spg(wgt.vect, fn=opt.fun, lower=LowerBounds, 
upper=UpperBounds, project="projectLinear", projectArgs=list(A=matrix(1, 1, 
length(wgt.vect)), b=1, meq=1)))





Ravi




From: Ravi Varadhan
Sent: Saturday, May 5, 2018 12:31 PM
To: m.ash...@enduringinvestments.com; r-help@r-project.org
Subject: adding overall constraint in optim()


Hi,

You can use the projectLinear argument in BB::spg to optimize with linear 
equality/inequality constraints.



Here is how you implement the constraint that all parameters sum to 1.



require(BB)

spg(par=p0, fn=myFn, project="projectLinear", projectArgs=list(A=matrix(1, 1, 
length(p0)), b=1, meq=1))



Hope this is helpful,

Ravi


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adding overall constraint in optim()

2018-05-05 Thread Ravi Varadhan
Hi,

You can use the projectLinear argument in BB::spg to optimize with linear 
equality/inequality constraints.



Here is how you implement the constraint that all parameters sum to 1.



require(BB)

spg(par=p0, fn=myFn, project="projectLinear", projectArgs=list(A=matrix(1, 1, 
length(p0)), b=1, meq=1))



Hope this is helpful,

Ravi


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why the length and width of a plot region produced by the dev.new() function cannot be correctly set?

2018-05-05 Thread 孙业平 via R-help

--From:Duncan 
Murdoch Send Time:2018 May 4 (Fri) 17:24To:孙业平 
; David Winsemius Cc:R Help 
Mailing List Subject:Re: [R] why the length and width of 
a plot region produced by the dev.new() function cannot be correctly set?
On 04/05/2018 3:04 AM, sunyeping via R-help wrote:
> 
> --From:David 
>Winsemius Send Time:2018 May 4 (Fri) 13:25To:孙业平 
>Cc:R Help Mailing List Subject:Re: 
>[R] why the length and width of a plot region produced by the dev.new() 
>function cannot be correctly set?
> 
>>   On May 3, 2018, at 6:28 PM, sunyeping via R-help  
>>wrote:
>>   
>>   When I check the size of the plot region usingdev.size("in")a new plot 
>>region is produced and in the Rconsole I get[1] 5.33 5.322917
> 
> Your test is all mangleed together. You failed in your duty to read the list 
>info and the Posting guide . NO HTML!
> 
>>   If I mean to produce a plot region with size setting 
>>bydev.new(length=3,width=3)a plot region is produced, but the size is 
>>[2.281250, 5.322917], as detected by the de.size function. If I 
>>type:dev.new(length=10,width=10)I get a plot region of with the size of 
>>[7.614583, 5.322917]. It seems that the width of the new plot region cannot 
>>be set, and tt is always 5.322917. The length of the new plot region can be 
>>set, but it is always smaller that the values I set.What do I miss? What is 
>>the correct way of setting the dimension of the new plot region? I will be 
>>grateful to any help.Best regards,
> 
> The size of the device is not the size of the plot region. You need to take 
>into account the margins. See ?par
> Thank you, David.I have read the par() document. Clearly the size of the plot 
>region is smaller than or equal to the divice size. However, if I produce a 
>graphic device with dev.new (length, width) or other functions, I find the 
>largest  width of the new device is always 5.3 inches whatever the values I 
>set, and the length of it is alway smaller than what I set.

The length and width aren't the first and second parameters for any 
device, and length isn't a parameter at all.  Try

dev.new(height = 10, width = 10)

and you should get a bigger device if it will fit on your screen.  If it 
won't fit, then you might get a smaller one, and you'll need to choose a 
non-screen device such as png() or pdf() instead of the default device.

Duncan Murdoch

  Could you tell me how to produce a graphic divice with correct size 
that I set? I need this function because the graphic divice cannot 
accomendate all of the graph I make with some of plot tools such as 
ggtree. In ggtree plot, part of the tree tips label are invisible 
(https://www.dropbox.com/s/87gyusx7ay1xxu8/tree.pdf?dl=0) even I set 
"par(mar=rep(0,4))". So I think I must plot the tree on a larger graphic 
device.  Best regards.
> 
> 
>>   
>>[[alternative HTML version deleted]]
>>   
>>   __
>>   R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>   https://stat.ethz.ch/mailman/listinfo/r-help
>>   PLEASE do read the posting guide 
>>http://www.R-project.org/posting-guide.html
>>   and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
> 'Any technology distinguishable from magic is insufficiently advanced.'   
>-Gehm's Corollary to Clarke's Third Law
> 
> 
> 
> 
> 
>  [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

"dev.new(height = 10, width = 10) " doesn't work neither. It produces a device 
with a size of [ 5.760417, 5.75]. My computer is a usual 14 inch thankpad 
labtop. Is 5 ~ inches really the up limit of the size of the R graphic device 
in computer screen? I doubt it.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error in chol.default((value + t(value))/2) : , the leading minor of order 1 is not positive definite

2018-05-05 Thread Troels Ring
Dear friends - I'm having troubles with nlme fitting a simplified model 
as shown below eliciting the error


Error in chol.default((value + t(value))/2) :
  the leading minor of order 1 is not positive definite -

I have seen the threads on this error but it didn't help me solve the 
problem.


The model runs well in brms and identifies the used parameters even with 
fixed effects for TRT  - but here in nlme TRT is ignored and I guess 
this is not the reason for the said error


Below is the quite clumsy simulated data set and specification of call 
to nlme - the start values are taken from fitted values in brms


library(ggplot2)
windows(record=TRUE)
#generate 3*10  rats - add fixed effects to the four parameters 
according to the three groups - add random effects pr each rat - add 
residual random effect

#Parameter values taken from Sapirstein AJP 181:330-6, 1955


set.seed(1234)
Time <- seq(1,60,by=1)
A <- 275; B <-  140;  g1 <- 0.1105; g2 <- .0161

N <- 30

AA <- rep(A,30)+rnorm(30,0,30);BB <- rep(B,30)+rnorm(30,0,15) ;
gg1 <- rep(g1,30)+rnorm(30,0,0.01); gg2 <- rep(g2,30)+rnorm(30,0,0.001)

TRT <- gl(3,10*60)
levels(TRT) <- c("CTRL","DIAB","HYPER")
AA1 <- AA + c(rep(0,10),rep(10,10),rep(-10,10))
BB1 <- BB + c(rep(0,10),rep(5,10),rep(-5,10))
Gg1 <- gg1 + c(rep(0,10),rep(0.01,10),rep(-0.01,10))
Gg2 <- gg2 + c(rep(0,10),rep(0.005,10),rep(-0.005,10))

getY <- function(A,B,g1,g2) {
Y  <- A*exp(-g1*Time) + B*exp(-g2*Time)
Y <- Y + rnorm(60,0,20)
}
YY <-  c()
for (i in 1:N) YY <- c(YY,getY(AA1[i],BB1[i],Gg1[i],Gg2[i]))
TT <- rep(Time,N)
RAT <- gl(N,length(Time))
dats  <- data.frame(RAT,TRT,TT,YY)
Dats <- dats
names(Dats)[c(3,4)] <- c("Time","Y")
dput(Dats,"dats0505.dat")

with(Dats,plot(Time,Y,pch=19,cex=.1,col=TRT))
ggplot(data=Dats,aes(x=Time,y=Y,group=RAT,col=TRT)) + geom_line()

library(nlme)

gfr.nlme <- nlme(Y ~ A*exp(-Time*g1)+B*exp(-Time*g2),
data = Dats,
fixed = A+g1+B+g2 ~1,
random = A+g1+B+g2 ~1,groups = ~ RAT,
start = c(255,115,130*1e-3,17*1e-3),
na.action = na.omit,verbose=TRUE,control = list(msVerbose = TRUE))
summary(gfr.nlme)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [ESS] ess-insert-function-outline

2018-05-05 Thread Patrick Connolly
On Fri, 04-May-2018 at 10:23AM +0200, Lionel Henry wrote:

|> 
|> 
|> > On 4 mai 2018, at 10:05, Patrick Connolly  
wrote:
|> > 
|> > That's the same as what's in my lisp/old directory.  What am I to
|> > learn from that?
|> 
|> You can copy-paste its contents into your emacs configuration file.
|> 
.

Many thanks, Lionel.  It works fine now.

And apologies for my ignorance.  

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___Patrick Connolly   
 {~._.~}   Great minds discuss ideas
 _( Y )_ Average minds discuss events 
(:_~*~_:)  Small minds discuss people  
 (_)-(_)  . Eleanor Roosevelt
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
ESS-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/ess-help


Re: [R] Discovering patterns in textual strings

2018-05-05 Thread Bert Gunter
"Does that help?"

No. I am not your private consultant. You need to reply to the list, which
I have cc'ed here, not just me.

I am still somewhat confused by your specifications, but others may not be.
Part of my confusion stems from your failure to provide a reproducible
example (see e.g. the posting guide linked below).  For example, I cannot
tell from your text whether the Abc and Bce strings contain one or more
spaces at the end. I shall assume they may but need not.

Anyway, here is a reproducible example and solution that assumes that the
substrings/patterns of interest to you occur at the beginning of the
strings and may or may not be followed by one of "." "_" or " "(space) and
then possibly further text which should be ignored. Assuming that you are
familiar with regular expressions, maybe this will help to get you started
even if I have misunderstood your specifications. If you aren't familiar
with regex's, maybe the stringr package may provide a gentler interface
than using R's raw regex functionality. Or maybe someone else can suggest a
better approach (which is another reason why you should reply to the list,
not just me).

z <- c("abc",
   "abc_def",
   "abc.def",
   "abc def",
   "abcd_ef",
   "abcd",
   "e","f")

pats <- unique(sub("^(.+)[. _]+.*", "\\1", z))
## gives:
> pats
[1] "abc"  "abcd" "e""f"


This gives you the four separate patterns that you could then use to group
your records, perhaps by:

> lapply(pats,function(x)grep(paste0("^", x,"([_. ]|$)"), z))
[[1]]
[1] 1 2 3 4

[[2]]
[1] 5 6

[[3]]
[1] 7

[[4]]
[1] 8

That is, indices 1-4 in z are the first group; 5 and 6 are the second; etc.



Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, May 4, 2018 at 9:00 PM, Jeff Reichman 
wrote:

> Bert
>
> Thank you for the  link.  Figured there might be something
>
> Regarding your questions
>
> This is from a large 53 Billion records.  The column in question are
> AdNames (Real Time Bidding data)
>
> #1. Generally yes, but not always
>
> #2 Separators could be underscores  (_) or dots (.) as in 1.2.3_ABC ..
>
> #3 Yes. So there could be Abc 123 could be a matching string
>
> This would not be considered a match  ...
> abc_something
> this.is_a long stringwithabcinthemiddle
>
> The sequence(s) are always are at the beginning (or so it appears).  Out
> of the 54 billion records  I am able to pull (SparkR sql) 948,679 unique
> strings.  It is from these unique strings that I (if possible)  want to
> identify the "key" strings.
>
> 1.  Abc_1232.niok7j9hd
> 2.  Abc
> 3.  Abc.2#348hfk2.njilo
> 4.  Abc.2
> 5.  Abc.7
> 6.  BAdfr_kajdhf98#kjsdh
> 7.  BAdrf_gofer
> 948679 
>
>
> So I may have a thousand individuals strings all of which have Abc as a
> common string, or Badrf.  So I am looking to pull "Abc," "BAdrf", etc.  So
> then I can go back and restructure the data to show that any record with
> Abc_1232.niok7j9hd if part of the Abc "Group," or Family ???
>
> Does that help
>
> Jeff
>
> -Original Message-
> From: Bert Gunter 
> Sent: Friday, May 4, 2018 5:41 PM
> To: reichm...@sbcglobal.net
> Cc: R-help 
> Subject: Re: [R] Discovering patterns in textual strings
>
> The answer is, of course, using regular expressions and/or libraries
> therefor. However, I do not think you have defined your problem
> sufficiently. Some questions I have:
>
> 1. Do possible patterns to be matched always appear at the beginning of
> your strings?
>
> 2. Always together between specified separators ("_"  in your example); or
> one of several specified separators; or otherwise?
>
> 3. Do spaces or other nonprinting characters occur in your strings?
>
> e.g. would
>
> abc_something
> this.is_a long stringwithabcinthemiddle
>
> be considered matching?
> There are undoubtedly other possibilities that I've missed.
>
>
>
> You may also find it useful to check this "task view" out for
> possibilities:
> https://cran.r-project.org/web/views/NaturalLanguageProcessing.html
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, May 4, 2018 at 3:25 PM, Jeff Reichman 
> wrote:
> > R Help Forum
> >
> >
> >
> > Is there a R library (or a way) that I can extract unique character
> > strings, or repeating patterns in textual strings.  Say for example I
> > have the following records:
> >
> >
> >
> > Abc_1234_kjhksh_276
> >
> > Abc
> >
> > Abc_1234_lakdofyo_324
> >
> > Bce_876_skdhk_*&^%*&
> >
> > Bce
> >
> > Bce_454
> >
> >
> >
> > And I would like to see the following results
> >
> > Abc
> >
> > Abc_1234
> >
> > Bce
> >
> >
> >
> >
> >
> > Jeff Reichman
> >
> >
> >